COT5-Editorial

cot5
editorial
feb14
hard
segment-tree
treap

#1

PROBLEM LINK:

Practice
Contest

Author: Tang Feihu
Tester: Hiroto Sekido, Gerald Agapov
Editorialist: Ajay K. Verma

DIFFICULTY:

Hard

PREREQUISITES:

Treap, Segment Tree

PROBLEM:

Support the following operation on a treap which is a a binary search tree according to keys, and a max-heap according to weights:

  1. insert a node with key k and weight w,
  2. delete a node with key k,
  3. find the distance between two nodes with key k1 and k2.

QUICK EXPLANATION:

Refer to the next section.

EXPLANATION:

Since the treap is a tree, the distance computation between two nodes can be reduced to the problem of finding lowest common ancestor of two nodes, and depth of a node. If the lowest common ancestor of node u and v is w, then

dist (u, v) = h (u) + h (v) - 2 h (w).

Lowest common ancestor (LCA):

In this section we discuss how to compute the lowest common ancestor of two nodes in a dynamic treap. The treap is a binary search tree according to the keys, hence, if we make an array of all treap nodes and sort them according to the key values, the array will represent the in-order of the treap.

Suppose we have two nodes u and v, and their indices is this in-order array are ku and kv (ku < kv). It is easy to note that the index kw of the lowest common ancestor node w of u and v must lie between ku and kv, i.e., ku < kw < kv. Further, the index kw will have the highest weight in the subarray [ku, kv]. Hence, using the in-order array one can easily find the lowest common ancestor of two nodes by doing a Range maximum query. for the range [ku, kv]. However, this will work only when the treap is static. In our problem the treap is changing, and hence its in-order array will also keep changing.

Even though the treap is changing, we can do an offline processing of the queries to find the set of all possible keys that will ever exist in the treap, and create an array with all keys present. Initially, we assign zero weight to all elements in the array. When a node (k, v) is inserted in the treap, we update the weight of the corresponding element in the array from 0 to v. Similarly, when a node (k, v) is deleted, we update the weight the corresponding element in the array from v to 0. In order to find the lowest common ancestor of u and v, we need to find the element in the subarray [ku, kv] with highest weight. Note that, now the array is fixed, we are only updating the values of its elements.

Hence, now the problem of finding lowest common ancestor is reduced to find the range maximum in a dynamic array. This can be achieved using a segment tree, such that both update and lowest common ancestor query take O (lg N) time, where N is the size of the array.

Depth Computation:

In this section, we discuss how to compute the depth of a node in a dynamic treap. First, we show how to compute the depth in a static treap, and then discuss how to modify our approach to handle the case of dynamic treap.

Let us start with an example. Suppose the treap contains the following nodes:
(1, 8), (2, 10), (3, 7), (4, 6), (5, 12), (6, 9), (7, 8), (8, 11)

The in-order array will be contain the elements in the same order as shown above. The node (5, 12) has the highest weight, hence, it will be the root of the treap. The left subtree of the treap will contain nodes (1, 8), (2, 10), (3, 7), (4, 6), while the right subtree of the treap will contain (6, 9), (7, 8) and (8, 11).

Lets look at the left subtree. Here (2, 10) is the node with highest weight, hence it will be the root, with left subtree containing (1, 8) and right subtree containing (3, 7) and (4, 6).

Observe that two nodes u and v can not be part of the same subtree, if there exist a node between them in the in-order array having weight higher than that of the two nodes. Because this node with higher weight (actually the node with the highest weight between the two lighter nodes) will become the root, and the two nodes with lower weights will go to different subtrees. Further, if there are no nodes in the in-order array between u and v, which has higher weight than both of u and v, then u and v will be in the same subtree having either u or v (whichever has higher weight) as root.

Based on the above observation, we can easily compute the depth of a node using the in-order array. If we want to compute the depth of a node v, which is at position kv, then we need to find how many indices kw exist such that the node w at index kw has higher weight than that of v, but no other nodes in the subarray [kv, kw] has higher weight than both of v and w. The number of such w’s will be the depth of the node v.

Let us see how to compute the depth of (3, 7) in above example using this approach. There are only two possible subarrays [kv, kw]:
(2, 10) to (3, 7)
(3, 7) to (5, 12)

Hence, the depth of (3, 7) is 2. The immediate parent of (3, 7) will be (2, 10) and the next ancestor will be (5, 12).

The following snippet computes the depth of a node v at index kv using the in-order array A.

int depth(int kv) {
    int d = 0;
    int mx = A[kv].weight;
    
    // Look for subarrays of the form [kv, -].
    for (int i = kv + 1; i <= N; ++i) {
        if (A*.weight > mx) {
            ++d;
            mx = A*.weight;
        }
    }
    
    // Look for the subarrays of the form [-, kv].
    mx = A[kv.weight];
    for (int i = kv - 1; i > 0; --i) {
        if (A*.weight > mx) {
            ++d;
            mx = A*.weight;
        }
    }
    
    return d;
}

So, now we know how to compute the depth of a node in the treap. However, the above snippet is not very efficient as it takes O (N) to compute the depth. Next, we discuss how to reduce the complexity of depth computation.

Data structure to reduce the complexity of depth computation:

In the above snippet, there are two traversals to compute the depth of a node. We only consider the first traversal, the other one can be done in a similar way. So the problem is that we are given an array, and for a given index k, we want to find out how many times the maximum of subarray [k, r] will change where r is varying from k + 1 to N.

The problem can be solved using a segment tree. We create a segment tree corresponding to array, where each node in the segment tree corresponds to a subarry. The leaf nodes correspond to the subarrays of size one, and the parent of two nodes corresponds to the subarray which can be obtained by merging the subarrays corresponding to the two children.

Each node of the segment stores some information corresponding to its subarray. More precisely, this information is the “prefix maximum” function of the subarray. The “prefix maximum” function is already described in the editorial of problem. For a given array A, we define the “prefix maximum” function PM as follows:
PM(i) = max(A[0], A[1], …, A*).

Clearly, the PM() function is a step function, and we only need to store the positions where it makes step and the value of the function at these positions.

Using this information the number of of subarrays [k, r] can be computed easily. In order to compute the number of subarrays [k, r] for a given k, we need to start from the leaf node in the segment tree corresponding to the subarray [k, k], and move towards the root, while maintaining mx, and d as in the above snippet. This is also shown in the snippet below:

int right_depth(segtree *tree, int k) {
    int d = 0;
    node *n = tree->get_leaf(k);
    
    // the highest value in the subarray corresponding to node n.
    int mx = n->max();
    
    while (n->parent) {
        node *p = n->parent();
        
        // We are only interested in right subarrays.
        if (n is right child of p) {
            n = p;
            continue;
        }
            
        // how many elements in the PM function of the right child are greater
        // the maximum value seen so far.
        d += p->right->rank(mx);
        
        // update the maximum value.
        mx = max(mx, p->right->max());
        n = p;
    }
    
    return d;
}

Since the height of segment tree is O (lg N). The above code will make at most O (lg N) iterations. Also in each iteration we need to compute the rank of an element. If the segment tree nodes stores the PM() function using a sorted array of steps, then this can be done in O (lg N) time as well, resulting in O (lg2 N) complexity of the above operation.

Now let us see if we can avoid storing the PM() function at each node in the segment tree. The only information that we about the PM() function is the number of elements in it which are greater than a given value h. Next, we show how to compute this information without storing the whole PM() function. We store only two values in a segment tree node:

  1. The maximum element of the corresponding subarray represented by mx, and
  2. number of steps in the PM() function represented by len.
struct node {
    int rank(int h);

    int mx;
    int len;
  
    // Use pointers for simplicity.
    node *parent;  
    node *left;
    node *right;
};

Suppose we want to calculate the rank of a given value h, i.e., the number of elements in the PM() function which are greater than h. If h = 0, then this will be the same as the number of steps, i.e., len. If h >= mx, then this value will be zero. Now consider the case when 0 < h < mx.

In this case we look at the two children of the node, which correspond to the smaller subarrays. There could be two possible cases:

  1. h >= left->mx. In this case no element in the left subarray is greater than h, so we can ignore it, and the answer will be the rank of h in the right subarray.
  2. h < left->mx. In this case all the elements of the PM() function of this node will be included except the ones which are smaller than h. Note than none of these smaller elements will be from the right subarray. This is because all the elements of right subarray which are present in the PM() function must be greater than left->mx, which in turn is greater than h. Hence, we only need to remove the elements from the left subarray, i.e., the answer would be (len - left->len + left->rank(h)).

Note that, in both cases we have reduced the problem to a smaller subarray which has half of the size of the original subarray. Hence, the complexity of this approach would be O (lg N).

// Returns number of elements of the PM() function which are
// greater than h.
int rank(int h) {
    if (h >= mx)
        return 0;
        
    if (h == 0 || node is a leaf)
        return len;
      
    // Case 1.   
    if (left->mx <= h)
        return right->rank(h);
    
    // Case 2.    
    return (len - left->len) + left->rank(h);
}

Now, we have seen how to compute the depth of a node in a static treap. Let us see how to modify our approach so that it also works for a dynamic treap. The only thing that we need to care about is how to update the mx and len value of a node, when the value of an element in the subarray changes. This can be done easily, we just need to start from the leaf node corresponding to size one subarray, and move towards root while updating all the nodes in the path.

// Updates the value at position k to val.
void update(segtree *tree, int k, int val) {
    node *n = tree->get_leaf(k);
    n->mx = val;
    
    n = n->parent;
    while(n) {
        n->mx = max(n->left->mx, n->right->mx);
        n->len = n->left->len + n->right->rank(n->left->mx);
    }
}

Hence, an update can be done in O (lg2 N) time.

TIME COMPLEXITY:

O (N lg2 N)

AUTHOR’S AND TESTER’S SOLUTIONS:

Author’s solution will be put up soon.
Tester’s solution will be put up soon.


#2

this problem is perfect


#3

Awesome solution! I started similarly, but store distances to the root in some segment tree that supports get the value and add to the segment instead of passing to some tricky things that will allow to compute these distances.

But test data for the problem are weak. My solution has complexity O(N^2 * log N) in the worst case but managed to be almost the fastest among contestants. In more details, let I(k, w) be the number of iterations needed for merge operation called from standard insert of the pair (k, w) to the treap and D(k) be the number of iterations needed for split operation called from standard delete of the node (k, w) from the treap. Then complexity of my approach is sum of I(k, w) and D(k) over all queries, multiplied by log N. The reason why it passed is that I was managed to reduce number of pre-merge/pre-split operations in insert/delete operations which I’m sure is large for some tests (otherwise we would have much more AC solutions). But it still easy to fail my solution.

Namely, Let’s first insert nodes (i, 100000 - 2 * i) and (1e9 - i, 99999 - 2 * i) for i=1…25000. So all edges in the tree will cross each of the line x=x0, 1e5 < x0 < 1e9 - 1e5. Basically the tree looks “zigzaggy”. Now while number of operations permits let’s do the following:

  • Pick random pair of the form (k, w) with 1e5 < k < 1e9 - 1e5 and 1e6 < w < 1e9
  • Insert (k, w) to to the tree
  • Query random pair of node
  • Delete (k, w) from the tree
  • Query random pair of node

The key thing is that each merge and split will completely rebuild the tree from “zig-zag” one to “two-branch” one and vice versa (Sorry I have no time to draw a pictures to illustrate it). Hence around 50000 operations will be needed and we can make around 75000 insertions/deletions after initial 50000 insertions. For this particular test we can hack it by “recognizing” that we have “the same” queries in terms of tree structure and do some caching but the test can be generalized to fail this as well. My solution should work more than an hour for such test (even naive one would be faster :))


#4

My solution is conceptually very similar to the one presented in the editorial. In terms of implementation, however, I didn’t use a segment tree (I thought about using it for a while, but I couldn’t immediately realize how to update the PM function when inserting/deleting an element). Instead, I divided the set of keys into blocks - if we had M keys inserted overall, I used sqrt(M) blocks of sqrt(M) keys each. For each block I maintained the PM staircase function - updating this function was easy in O(sqrt(M)) time (I simply recomputed it from scratch in the block where the key was inserted/deleted). On the other hand, in this model, a query takes O(sqrt(M) * log(sqrt(M))) time - I traversed the blocks maintaining the current maximum. For each block, if the current maximum was below the maximum of the block, I used binary search to see how many steps of the PM function are above the current maximum. I wasn’t sure this solution would fit within the time limit, but I implemented it anyway thinking that I would try to optimize it later if necessary - it turned out not to be necessary.


#5

I haven’t yet read all this editorial, but, THANK YOU!! REALLY!! @djdolls, these are really, really amazing editorials :smiley: With these sorts of editorials I think people can really learn new concepts and learn with who knows best!! Thanks for this amazing work!


#6

Thanks Bruno! I am glad you liked them.