DISTNUM2 - Editorial

kevinsogo · May 16, 2016, 3:18am

PROBLEM LINK:

Author: Misha Chorniy
Tester: Kevin Atienza
Translators: Sergey Kulik (Russian), Team VNOI (Vietnamese) and Hu Zecong (Mandarin)
Editorialist: Kevin Atienza

DIFFICULTY:

Hard

PREREQUISITES:

Range trees, segment trees, functional data structures, persistent segment trees

PROBLEM:

You are given an array A[1\ldots N] of positive integers. You have to answer Q queries on it of the following type:

Given l, r and k, let S denote the sorted list of distinct elements of A[l\ldots r]. What is the k th number in this list? If it doesn’t exist, output -1.

Additionally, each query partly depends on the answer to the previous query, so you’re forced to answer the queries in the given order (except in subtask 3).

QUICK EXPLANATION:

First, normalize and compress the values of A[1\ldots N], so they are in \{0, 1, \ldots, N-1\}.

The answer for a query is equivalent to the smallest V such that A[l\ldots r] has at least k distinct values, so we can find it using binary search. We just need to answer the following question:

Given l, r, V, how many distinct elements in A[l\ldots r] are \le V? This can be turned into a geometry query. First, we define N points \{(x_i,y_i,z_i) : 1 \le i \le N\}, where (x_i,y_i,z_i) is defined as follows: Suppose A[i] = v. Also, let j be the index of the previous occurrence of v, i.e. A[j] = v and j < i is the largest such value. If no such j exists, set j = 0. Then (x_i,y_i,z_i) = (v,j,i).

The answer to the query above is now the number of points (x_i,y_i,z_i) such that x_i \le V, y_i < l and l \le z_i \le r. Thus, we now have standard geometric range queries! Using 3D range trees, we can preprocess the points in O(N \log^3 N) time and each query can be answered in O(\log^3 N) time, but with binary search it becomes O(\log^4 N) which might be too slow. Instead, we can use fractional cascading or functional/persistent segment trees to reduce these to O(N \log^2 N) and O(\log^3 N), respectively, and with some clever tree traversals we can even reduce query time to O(\log^2 N).

EXPLANATION:

Subtask 1

In the first subtask, we have Q\times N \le 10^7, so solutions that answer each query in O(N) time, or even O(N \log N) time, is acceptable. In this case, the straightforward approach works:

Compute the sorted version of A[l\ldots r].
Remove the duplicates.
Find the k th number in the list, or if the list has less than k elements, output -1.

This takes O(N \log N) time in the worst case, so doing this Q times takes O(QN\log N) time. With a reasonable implementation, this will pass the time limit of the first subtask.

Subtask 2

In the second subtask, we no longer have Q\times N \le 10^7, but instead we have the additional restriction k = 1. It means that we are finding the first element in the sorted list of distinct elements of A[l\ldots r]. Now, regardless of whether there are duplicates or not in A[l\ldots r], this is clearly the same as the minimum element in A[l\ldots r]. In other words, the query reduces to a standard range minimum query, which has lots of possible solutions! Two fast approaches to this are the following:

Approach 1: Construct a segment tree on top of the array, so that each node represents some subarray of A and also contains the minimum among all elements in this subarray. Constructing this tree takes O(N) time, and afterwards, each range minimum query can be answered in O(\log N) time.

Approach 2: Construct a table of minimums, T[1\ldots N][0 \ldots \log_2 N], where T[i][k] is the minimum element in the subarray A[i\ldots i+2^k-1]. This table can be constructed in O(N \log N) time using dynamic programming. Afterwards, each range minimum query can be answered in O(1) time: The minimum value in A[l\ldots r] is \min(T[l][k], T[r-2^k+1][k]), where 2^k is the largest power of two \le r - l + 1.

Asymptotically faster solutions exist, but they are much complicated, and in any case, this solution doesn’t apply to other subtasks, so there isn’t much use in them. You can read about faster solutions here.

Subtask 3 & 4

Now, let’s move on to answering the hardest subtask, which is the original problem without any additional restrictions. Note that subtask 3 is just an offline version of the problem, which may permit some simpler solutions, but we won’t discuss them here.

Binary search

The first step is realizing that a query can be converted into a form that is suitable for binary search. Each query is equivalent to the following:

Find the lowest value V such that the number of distinct elements of A[l\ldots r] that are \le V is at least k.

This is certainly solvable with binary search for the value V, as long as we can answer the following question efficiently:

Given l, r and V, how many distinct elements of A[l\ldots r] are \le V?

So let’s focus on this query, which now looks more like a geometric range query. Unfortunately, the word “distinct” prevents us from converting this problem directly into a geometric query. Without “distinct”, one fast solution for this problem would be:

For preprocessing: for each 1 \le i \le N, add the point (x_i,y_i) = (A[i],i).
For each query (l,r,V): count the number of points in the rectangle [0,V]\times [l,r].

Thus we have reduced each query to 2D geometric queries, for which there are some well-known solutions, for example using 2D range trees.

But since we only need to count distinct elements, we need to be more creative with our query. The solution above counts duplicate entries multiple times, so we must find a solution that doesn’t do that. One way is to only count the first occurrence of each distinct value in the range A[l\ldots r].

How do we accomplish this? Well, for a particular value v, suppose A[i] = v for some l \le i \le r. For A[i] to be the first occurrence of v in A, the following additional condition must hold true:

If j is the largest index < i such that A[j] = v, then j < l. (If A[i] is the first occurrence of v in the whole array, we can set j = 0.)

Thus, we can now refine our geometric query above to the following:

For preprocessing: for each 1 \le i \le N, let j be the largest index < i such that A[j] = A[i], or j = 0 if no such index exists. Then add the point (x_i,y_i,z_i) = (A[i],j,i).
For each query (l,r,V): count the number of points in the 3D “rectangle” [0,V]\times [0,l-1]\times [l,r].

Now we have properly converted the query into a 3D geometric range query!

Another thing: due to the binary search, whatever complexity we get in solving this problem will get multiplied by O(\log A_{\max}) time. We can reduce this to just O(\log N) by compressing the values of A to the range [0\ldots N-1]. This preprocessing step can be done in O(N \log N) time, and when printing the answer we just need to remember to revert back to the original list of values.

Geometric queries

Our goal is to answer 3D range queries. Specifically, given N points \{(x_i,y_i,z_i) : 1 \le i \le N \}, answer the following kind of query:

How many points are in the range [a,b]\times [c,d]\times [e,f]?

The usual way of solving k-dimensional range queries is with k-dimensional range trees. You can think of a 1D range tree as a simple segment tree, and a k-dimensional range tree as a balanced tree of (k-1)-dimensional range trees. Such a structure on N points can be constructed in O(N \log^k N) time, and each k-dimensional “rectangle” query can be answered in O(\log^k N) time, by using the usual segment tree recursion at each dimension of the tree.

The following is a pseudocode of a 2D range tree:

// 1D tree
// this is just a normal segment tree
class Tree1D:
    constructor(Y, i, j):
        // Y is the sorted list of values
        // Y[i..j] is the relevant range
        this.i = Y[i]
        this.j = Y[j]
        if i == j:
            // this is a leaf
            this.count = 1
            this.l = this.r = NULL
        else:
            k = (i + j) / 2
            this.l = new Tree1D(Y, i, k)
            this.r = new Tree1D(Y, k+1, j)
            this.count = this.l.count + this.r.count

    range_query(a, b):
        // count the number of elements in range [a,b]
        if a <= this.i and this.j <= b:
            return this.count
        else if this.j < a or b < this.i:
            return 0
        else:
            return this.l.range_query(a, b) + this.r.range_query(a, b)

// 2D tree
class Tree2D:
    constructor(P, i, j):
        // P is the list of points, sorted on "x"
        // P[i..j] is the relevant range
        this.i = P[i].x
        this.j = P[j].x
        Y = sorted list of elements of [P[i].y, P[i+1].y, P[i+2].y, ..., P[j].y]
        this.tree = new Tree1D(Y, 0, Y.length-1)
        if i == j:
            this.l = this.r = NULL
        else:
            k = (i + j) / 2
            this.l = new Tree2D(P, i, k)
            this.r = new Tree2D(P, k, j)

    range_query(a, b, c, d):
        // count the number of points in the rectangle [a,b] x [c,d]
        if a <= this.i and this.j <= b:
            return this.tree.range_query(c, d)
        else if this.j < a or b < this.i:
            return 0
        else:
            return this.l.range_query(a, b, c, d) + this.r.range_query(a, b, c, d)

Notice that both classes have very similar implementations, but each node for Tree2D contains a Tree1D which handles the “y” values. You can probably figure out how to construct 3D (or higher-dimensional) range trees from this.

Also, this construction takes O(N \log^2 N) time because of the sorting. Afterwards, each range_query call runs in O(\log^2 N) time.

Unfortunately, in our case, we will need 3D range tree, and each query takes O(\log^3 N) time. And because we’re doing a binary search on top of that, each query for the original problem takes O(\log^4 N) time, which is probably too slow to pass the time limit.

Instead, we will describe two techniques to improve the running time of each 3D range query.

Technique 1: Fractional cascading

We can use fractional cascading to reduce a 3D range query from O(\log^3 N) to O(\log^2 N). We can illustrate this idea using 2D range trees.

We can think of a 2D range tree as a segment tree on the x coordinate, where each node also contains a segment tree, this time on the y coordinate. The first step is to replace the segment trees on y with simple sorted arrays, and replace range queries with simple binary searches.

Now this doesn’t reduce the asymptotic running time, but it helps reduce the complexity of our structure. For example, we can now construct a 2D range tree in O(N \log N) time instead of O(N \log^2 N) by using a merge-sort-like algorithm; Suppose we want to construct a 2D range tree over the points \{(x_1, y_1), (x_2, y_2), \ldots, (x_N, y_N)\}, and suppose this is sorted by x and then by y. Then:

class RangeTree2D:
    constructor(i, j, xi, xj, y_list, left, right):
        this.i = i
        this.j = j
        this.xi = xi
        this.xj = xj
        this.y_list = y_list
        this.left = left
        this.right = right

def construct(P, i, j):
    // constructs the the tree from the i'th to the j'th point

    if i == j:
        return new RangeTree2D(i, i, P[i].x, P[i].x, [P[i].y], NULL, NULL)
    else:
        k = (i + j) / 2
        left = construct(P, i, k)
        right = construct(P, k+1, j)
        y_list = merge(left.y_list, right.y_list) // merge sort's "merge"
        return new RangeTree2D(i, j, P[i].x, P[j].x, y_list, left, right)

Notice that instead of sorting y_list per node, we just “merge” the y_lists of the two children in linear time, because they’re already sorted.

Now, here comes the “fractional cascading” part. The reason each range query runs in O(\log^2 N) time is that we perform a binary search over every node we visit. But we can speed this up by noticing that we are binary searching for the same value over and over again, and that we are only visiting nodes in a single path from the root to the leaf! The idea is to use this fact so that after a single binary search at the root, we don’t have to perform another binary search again.

We do this by setting up lots of pointers. For each node, and for each i from 0 to y_list.length-1, we keep track of two values.

The largest index j such that left.y_list[j] <= y_list[i].
The largest index j such that right.y_list[j] <= y_list[i].

Using these pointers, once we binary search for a value at the root, we don’t have to binary search for this value at the child—we just follow the pointer! This allows us to reduce range query complexity to O(\log N).

Now, I admit that I skimmed over some details, but I hope you at least get the idea behind.

By performing this technique for every 2D range tree inside our 3D range tree, our range query reduces now to O(\log^2 N).

Technique 2: Functional data structures

To improve the running time, we will try to use a different approach, via functional data structures. Functional data structures are data structures that are implemented under the functional programming paradigm. One of the key characteristics of functional programming is that functions don’t have “side effects”, and variables don’t change once assigned. (In fact, the term “variable” becomes a misnomer.) Data structures built under this paradigm are consequently persistent, i.e., older versions of the data structure are still fully available.

To write our structures under this paradigm, we need only ensure that we keep our functions from having side effects, like changing some global variables or attributes. For example, to implement a functional 1D segment tree, instead of updating the attributes when we perform an update, we simply fully copy the nodes that have been updated. On an update, this makes us copy O(\log N) nodes, but the upside is that we still have the old segment tree fully available. Note that the running time of an update and query is still O(\log N)!

Once again, we’ll illustrate with 2D range trees. Suppose we want to make a 2D range tree over \{(x_1, y_1), (x_2, y_2), \ldots, (x_N, y_N)\}, and assume this is sorted in x, and then in y.

Each 2D range query counts the number of points some rectangle [x_1,x_2]\times [y_1,y_2]. If all coordinates are nonnegative, this is equal to the number of points in [0,x_2]\times [y_1,y_2] minus the number of points in [0,x_1-1]\times[y_1,y_2], so we can just assume that x_1 = 0.

If the queries were offline, i.e. we know all queries beforehand, then we can answer all query rectangles in increasing order of x_2. The way we do this is to maintain a segment tree on the y s, initially empty. First, we sort all queries and points according to x, and then process them in that order:

For every point (x_i,y_i), add y_i to the segment tree.
For every query rectangle [0,x_2]\times [y_1,y_2], count the number of values in the segment tree in the range [y_1,y_2].

In other words, we’re effectively turning the x coordinate into a “time” dimension. It’s not hard to see why this works.

Unfortunately, the queries are not offline, because some queries are dependent on the result of previous queries (note that we’re performing binary search). The solution to this is to use a functional/persistent segment tree to construct all possible versions of the segment tree, by inserting the (x_i,y_i) points in order. Then, for a query rectangle [0,x_2]\times [y_1,y_2], we simply find the version of the segment tree corresponding to x_2, and query that segment tree!

Using this approach for our 2D range tree makes each query O(\log N), and for a 3D range tree, each range query O(\log^2 N).

Refined “binary search”

The techniques above allow us to reduce the running time of a 3D range query from O(\log^3 N) to O(\log^2 N), but with binary search, our running time is still O(\log^3 N), which might still be too slow. Our goal is to reduce this further.

In fact, we can do this by performing a more clever version of our binary search! The key insight is that we are binary searching over the first coordinate, x: Remember that we are finding the lowest value V such that [0,V]\times [0,l-1]\times [l,r] has at least k points.

Our original approach is the following:

// tree3d is our 3d range tree

def find_answer(l,r,k):
    V_low = -1
    V_high = N
    while V_high - V_low > 1:
        V_mid = (V_low + V_high) / 2
        count = tree3d.range_query_3d(0,V_mid, 0,l-1, l,r)
        if count >= k:
            V_high = V_mid
        else:
            V_low = V_mid
    return V_high

which is just a normal binary search. However, we can use the structure itself as our guide for binary search! Here’s an illustration:

// here, ".tree2d" is the underlying 2d range tree of a 3d range tree

def find_answer(l,r,k):
    curr = tree3d
    while curr.xi != curr.xj:
        count = curr.left.tree2d.range_query_2d(0,l-1, l,r)
        if count >= k:
            curr = curr.left
        else:
            curr = curr.right
            k -= count
    return curr.xi

To help you understand this better, this is the same sort of technique that allows us to find the k th smallest value in a normal binary search tree, as long as we keep track of the size of each subtree at each node.

Since each 2D range query runs in O(\log N) time, and we perform O(\log N) of those queries, the overall running time reduces to O(\log^2 N) time, which is good enough!

Time Complexity:

O((N + Q)\log^2 N)

AUTHOR’S AND TESTER’S SOLUTIONS:

Setter
Tester

mightymercado · May 16, 2016, 3:14pm

Can this be solved using persistent segment trees (PST)? MKTHNUM on Spoj is almost the same problem as this one except that the numbers are guaranteed to be distinct. There is a few tutorials online on how to solve MKTHNUM using Pst. I am wondering if this problem (DISTNUM2) can be solved using PST too or has anyone ACed it using PST? Thanks! Great problem and editorial.

mightymercado · May 16, 2016, 3:27pm

@himanshujaju
Can you explain your approach?

himanshujaju · May 16, 2016, 3:56pm

This is my approach (and a lot of others used this) :

Coordinate compression on the values to map array values in range : [1,N].
Divide the array into blocks using sqrt decomposition. For each block maintain a bitset (you may have to custom code it). The bitset has N bits and marks whether the ith value (after coordinate compression) is present or not.
For all pairs of block numbers (i,j) find the distinct elements. For any (L,R), it is simply the bitwise OR of all bitsets in [L,R].
Follow normal sqrt decomposition for queries now. Generate the bitset for the query range and using builtin_popcount, you can find the answer in O(N/64) for every query.

Time complexity : O(N * sqrt(N) + N * (N/64)).

grebnesieh · May 16, 2016, 4:01pm

It can be solved using persistent segment trees. More precisely, a persistent segment tree whose nodes are themselves segment trees which make up another persistent segment tree.

AC solution: CodeChef: Practical coding for everyone

EDIT: This question is basically a combination of kth order (SPOJ MKTHNUM) and distinct count (SPOJ DQUERY)

Here is a very good explanation of MKTHNUM.

DQUERY has a similar approach but instead of an array that represents the count of elements you use an array that represents the count of prev[i] value of elements. Where, given an array A, prev[i] is the maximum index j < i such that A[i] = A[j].

Try and solve these beforehand if you haven’t yet or at least make yourself familiar with the solutions.

For this problem we implement a persistent segtree like the one in MKTHNUM but instead of storing count it stores a segtree like the one we used in DQUERY.

So when it has to decide which direction to go it doesn’t just compare count of all elements but count of all distinct elements. The key idea is that the segtree (which tracks prev[i]) of first N values over a given interval differs in logn values from the segtree over the first N - 1 values over the same interval.

Hope this clears it up.

mightymercado · May 16, 2016, 6:28pm

@grebnesieh Yes, can you expain it like how @himanshujaju explained his approach? Thanks!

prudvinit · May 17, 2016, 3:21pm

I didn’t understand the line “Find the lowest value V such that the number of distinct elements of A[l…r] that are ≤V is at least k”. I think the line should be “Find the lowest value V in the range A[l…r] such that the number of distinct elements of A[l…r] that are <=V is exactly k”

s4shyam95 · May 17, 2016, 3:46pm

Hi, can someone explain me the offline approach required to answer subtask 3.
I know Mo’s algorithm, so I can sort queries in each sqrt(N) bucket and process them in order. But how exactly can I mantain a structure to add/remove elements to the set , and give me the K-th element in the set, each in no more than O(logN)

Can someone help me out ?

grey_code · May 17, 2016, 5:29pm

About the offline approach to solve subtask 3:

First use coordinate compression on the array elements so that all array elements are to mapped to numbers 1 to N. Then create an array of previous and next occurance for all indices in the array. Then use MO’s algorithm. For this create a segtree with N leaf nodes (i.e. 2*N-1 total nodes ) and a node representing range (i,j) will contain count of the number of array elements in range (i,j) present for current position of mo-left and mo-right. Now suppose we want to increase mo-right then we see that whether the previous occurance of the array element at mo-right+1 is less than mo-left. If yes then add 1 to all the ranges in segtree including this array element. This takes O(log n) time. Once mo-left = l and mo-right = r ( l and r represent query ) then query on the segtree takes O(log n) time.

s4shyam95 · May 17, 2016, 7:01pm

@grey_code Thanks! I undetstood the part where you find previous and next occurence of the number,

But I am having some trouble understanding what your seg tree is doing.
So if I understood this right, you can get number of distinct elements in range [i,j] for current mo-left and current mo-right.
How does that help you find the K-th element in the range [i,j].

Thanks again!

grey_code · May 18, 2016, 3:05am

@s4shyam95

You understood it right that for certain mo-left and mo-right I have a segtree each of whose node stores number of distinct elements in its corresponding range [i,j]. Clearly finding kth largest element means kth largest element in range [1,N] (Remember all array elements are mapped to numbers 1 to N). This is a basic query. Now if I query giving k to the root node it will check if its left child has more than k elements. If yes it will go to the left child and repeat the procedure. If left child has less than k elements it will go the right child with parameter k-(count of left child) and then repeat the procedure. This process will continue until the leaf node is reached, hence taking O(height) = O(log n) time.

s4shyam95 · May 18, 2016, 1:19pm

@grey_code Thanks a lot, that was great explanation, I understood if correctly once I read this followed by seeing your code for it.

Though I believe that the prev and next can be eliminated by using a frequency map for current range i.e. mo-left to mo right.

vinay_goel21 · June 12, 2016, 2:00pm

@grebnesieh Thanks a lot for sharing your solution. Understood a whole new way to use dynamic pointers. I solved it after reading the editorial in N* log^ 3( N) and still got AC:) but I still couldn’t get how to reduce memory consumption to log( N) using persistent segment tree. Your approach taught me both the log( N) memory as well as N* log^ 2( N) approach. Keep sharing…:D.

Here’s my solution( N* log^ 3( N) ): CodeChef: Practical coding for everyone

I’d love to get any reviews on my solution to improve the approach or any explanation on how did it pass the time limit.

himanshujaju · May 16, 2016, 3:20pm

There is another approach using bitset (you have to code your own bitset though) and sqrt decomposition, which is pretty easy to code!

nilesh3105 · May 16, 2016, 5:23pm

Can you explain it more?

grebnesieh · May 16, 2016, 6:53pm

Added explanation.

mightymercado · May 17, 2016, 4:05pm

I am familiar with MO’s algorithm. You can use c++ map to store both the presence and the count of an integer. So everytime you increase or decrease the range, you either decrease or increase the count, and when the count is 0, you basically remove it. The number of distinct integers is basically the size of the map. The problem is to get the kth integer in the map, unforunately, c++ map doesn’t support random access. One solution is to use an auxillary vector that works together with the map to allow random access. let me know if you want me to explain in more detail.

s4shyam95 · May 17, 2016, 6:53pm

@mightymercado Yes please, can you explain or give me a link that explains the how to use a auxillary vector that works together with the map to allow random access.

Thanks a lot!

vsp4 · May 17, 2016, 7:42pm

@mightymercado, map with logn insertion and logn order statistics doesn’t work unfortunately. You could use policy based data structure for this as in: CodeChef: Practical coding for everyone However this tle’s for 3 of the tasks. I am wondering whether my mo algorithm is correctly set up or something wrong with it
However seeing others solution for 60 points, storing a binary indexed tree works

mightymercado · May 17, 2016, 7:45pm

Hmm sorry. I was wrong about auxillary array. I think you need to use an order tree Order statistic tree - Wikipedia