DISTNUM2 - Editorial

himanshujaju · May 16, 2016, 3:56pm

This is my approach (and a lot of others used this) :

Coordinate compression on the values to map array values in range : [1,N].
Divide the array into blocks using sqrt decomposition. For each block maintain a bitset (you may have to custom code it). The bitset has N bits and marks whether the ith value (after coordinate compression) is present or not.
For all pairs of block numbers (i,j) find the distinct elements. For any (L,R), it is simply the bitwise OR of all bitsets in [L,R].
Follow normal sqrt decomposition for queries now. Generate the bitset for the query range and using builtin_popcount, you can find the answer in O(N/64) for every query.

Time complexity : O(N * sqrt(N) + N * (N/64)).

grebnesieh · May 16, 2016, 4:01pm

It can be solved using persistent segment trees. More precisely, a persistent segment tree whose nodes are themselves segment trees which make up another persistent segment tree.

AC solution: CodeChef: Practical coding for everyone

EDIT: This question is basically a combination of kth order (SPOJ MKTHNUM) and distinct count (SPOJ DQUERY)

Here is a very good explanation of MKTHNUM.

DQUERY has a similar approach but instead of an array that represents the count of elements you use an array that represents the count of prev[i] value of elements. Where, given an array A, prev[i] is the maximum index j < i such that A[i] = A[j].

Try and solve these beforehand if you haven’t yet or at least make yourself familiar with the solutions.

For this problem we implement a persistent segtree like the one in MKTHNUM but instead of storing count it stores a segtree like the one we used in DQUERY.

So when it has to decide which direction to go it doesn’t just compare count of all elements but count of all distinct elements. The key idea is that the segtree (which tracks prev[i]) of first N values over a given interval differs in logn values from the segtree over the first N - 1 values over the same interval.

Hope this clears it up.

mightymercado · May 16, 2016, 6:28pm

@grebnesieh Yes, can you expain it like how @himanshujaju explained his approach? Thanks!

prudvinit · May 17, 2016, 3:21pm

I didn’t understand the line “Find the lowest value V such that the number of distinct elements of A[l…r] that are ≤V is at least k”. I think the line should be “Find the lowest value V in the range A[l…r] such that the number of distinct elements of A[l…r] that are <=V is exactly k”

s4shyam95 · May 17, 2016, 3:46pm

Hi, can someone explain me the offline approach required to answer subtask 3.
I know Mo’s algorithm, so I can sort queries in each sqrt(N) bucket and process them in order. But how exactly can I mantain a structure to add/remove elements to the set , and give me the K-th element in the set, each in no more than O(logN)

Can someone help me out ?

grey_code · May 17, 2016, 5:29pm

About the offline approach to solve subtask 3:

First use coordinate compression on the array elements so that all array elements are to mapped to numbers 1 to N. Then create an array of previous and next occurance for all indices in the array. Then use MO’s algorithm. For this create a segtree with N leaf nodes (i.e. 2*N-1 total nodes ) and a node representing range (i,j) will contain count of the number of array elements in range (i,j) present for current position of mo-left and mo-right. Now suppose we want to increase mo-right then we see that whether the previous occurance of the array element at mo-right+1 is less than mo-left. If yes then add 1 to all the ranges in segtree including this array element. This takes O(log n) time. Once mo-left = l and mo-right = r ( l and r represent query ) then query on the segtree takes O(log n) time.

s4shyam95 · May 17, 2016, 7:01pm

@grey_code Thanks! I undetstood the part where you find previous and next occurence of the number,

But I am having some trouble understanding what your seg tree is doing.
So if I understood this right, you can get number of distinct elements in range [i,j] for current mo-left and current mo-right.
How does that help you find the K-th element in the range [i,j].

Thanks again!

grey_code · May 18, 2016, 3:05am

@s4shyam95

You understood it right that for certain mo-left and mo-right I have a segtree each of whose node stores number of distinct elements in its corresponding range [i,j]. Clearly finding kth largest element means kth largest element in range [1,N] (Remember all array elements are mapped to numbers 1 to N). This is a basic query. Now if I query giving k to the root node it will check if its left child has more than k elements. If yes it will go to the left child and repeat the procedure. If left child has less than k elements it will go the right child with parameter k-(count of left child) and then repeat the procedure. This process will continue until the leaf node is reached, hence taking O(height) = O(log n) time.

s4shyam95 · May 18, 2016, 1:19pm

@grey_code Thanks a lot, that was great explanation, I understood if correctly once I read this followed by seeing your code for it.

Though I believe that the prev and next can be eliminated by using a frequency map for current range i.e. mo-left to mo right.

vinay_goel21 · June 12, 2016, 2:00pm

@grebnesieh Thanks a lot for sharing your solution. Understood a whole new way to use dynamic pointers. I solved it after reading the editorial in N* log^ 3( N) and still got AC:) but I still couldn’t get how to reduce memory consumption to log( N) using persistent segment tree. Your approach taught me both the log( N) memory as well as N* log^ 2( N) approach. Keep sharing…:D.

Here’s my solution( N* log^ 3( N) ): CodeChef: Practical coding for everyone

I’d love to get any reviews on my solution to improve the approach or any explanation on how did it pass the time limit.

himanshujaju · May 16, 2016, 3:20pm

There is another approach using bitset (you have to code your own bitset though) and sqrt decomposition, which is pretty easy to code!

nilesh3105 · May 16, 2016, 5:23pm

Can you explain it more?

grebnesieh · May 16, 2016, 6:53pm

Added explanation.

mightymercado · May 17, 2016, 4:05pm

I am familiar with MO’s algorithm. You can use c++ map to store both the presence and the count of an integer. So everytime you increase or decrease the range, you either decrease or increase the count, and when the count is 0, you basically remove it. The number of distinct integers is basically the size of the map. The problem is to get the kth integer in the map, unforunately, c++ map doesn’t support random access. One solution is to use an auxillary vector that works together with the map to allow random access. let me know if you want me to explain in more detail.

s4shyam95 · May 17, 2016, 6:53pm

@mightymercado Yes please, can you explain or give me a link that explains the how to use a auxillary vector that works together with the map to allow random access.

Thanks a lot!

vsp4 · May 17, 2016, 7:42pm

@mightymercado, map with logn insertion and logn order statistics doesn’t work unfortunately. You could use policy based data structure for this as in: CodeChef: Practical coding for everyone However this tle’s for 3 of the tasks. I am wondering whether my mo algorithm is correctly set up or something wrong with it
However seeing others solution for 60 points, storing a binary indexed tree works

mightymercado · May 17, 2016, 7:45pm

Hmm sorry. I was wrong about auxillary array. I think you need to use an order tree Order statistic tree - Wikipedia

mightymercado · May 17, 2016, 7:48pm

Wow, you got AC with O(N*N/64) :o

kevinsogo · May 17, 2016, 10:51pm

The editorial also includes an approach using persistent segment trees. (“functional data structures”)

kevinsogo · May 17, 2016, 10:52pm

Using “at least” in this case is equivalent, but has the added benefit that if the statement is true for V, then the statement is true for V’ > V.