PROBLEM LINK:Author: Utkarsh Lath DIFFICULTY:Medium PREREQUISITES:Heavy light decomposition, Binary index tree, Segment tree, Caterpillar tree PROBLEM:We are given a rooted tree, where each node is assigned a unique number in the beginning (however after some operations on the tree, two nodes may have the same number). The weight of an edge is either 0 or 1 depending on whether the end points of the edge are assigned the same number. We need to support two kind of queries on this tree: The first kind of query takes a node x and assigns all the nodes in the path from root to x the same number y which was not present in the tree (y can take any value which is not present in the tree). The second kind of query takes a node x and returns the average distance from root to a node in the subtree rooted at x (the distance of a node from root is the sum of weights of the edges in the path from root to the node). QUICK EXPLANATION:The average distance from root to a node in the subtree rooted at x is the sum of two values: All the three queries (update query, and the above two queries) can be answered in O (lg N) time in a balanced tree as well as in a caterpillar tree using a combination of binary index tree and segment tree. Hence, an algorithm based on heavylight decomposition can be used to handle the queries in a general tree in O (lg^{2} N) time. EXPLANATION:Let us denote the root of the tree by r. The weight of an edge between two nodes a and b is given by c(a, b). c(a, b) = 1, if a and b have different numbers The distance between two nodes a and b is denoted by d(a, b) which is the sum of the weights of all edges in the path from a to b. If the nodes in the subtree rooted at x are x_{1}, x_{2}, …, x_{m}, then the average distance from root to a node in the subtree can be calculated as follows: avg = 1/m * [d(r, x_{1}) + d(r, x_{2}) + ..+ d(r, x_{m})] For each node x_{i}, the path from root to x_{i} must pass through x. avg = 1/m * [(d(r, x) + d(x, x_{1})) + (d(r, x) + d(x, x_{2})) + ..+ (d(r, x) + d(x, x_{m}))] Hence, the computation of average can be done in two steps: In the first step we compute d(r, x), and in the second step we calculate the average distance between x and a node in the subtree rooted at x. In a more general version of the problem we can assume that we need to support three kind of queries: Since the update query does not affect the topology of the tree, for each node x we can precompute the size of the subtree rooted at x. This will reduce the third query to compute the sum of distances from node x to the nodes in its subtree. From now onwards, we will assume that the third query corresponds to computing sum of distances. Let us denote the sum of distances in the subtree rooted at x by S(x) Now let us see how many times an edge between nodes x_{i} and x_{j} (if exists) is considered in the above sum. The edge (x_{i}, x_{j}) will come in all the paths from x to x_{k}, where x_{k} lies in the subtree rooted at x_{j}. Hence, S(x) can be written as here sz(b) denotes the number of nodes in the subtree rooted at b, and c(a, b) is the weight of the edge (a, b). In an abuse of notation, lets call the quantity sz(b) * c(a, b) as the generalized weight of edge (a, b). The above definition of S(x) is more useful as it allows to compute the S(x) of a node by using the S() of its children. Therefore the S() of all nodes can be computed using a single traversal of the nodes in reverse topological order. If we define T(x) as T(x) = S(x) + sz(x) * c(parent(x), x), where x is not root T(x) = S(x), if x is root Then, we can see that if the children of x are y1 , y2 …yk, then It can be seen that the update query changes the S(y) for only those y which lie on the path from root to x. An O(h) algorithm:In this section, we discuss an algorithm to answer the above queries in O (h) time, where h is the height of the tree, i.e., the length of the longest path from root to a node in the tree. At each node x, we store the three values: Note that, for any x there can be at most one y such that c(x, y) = 0. Hence, I(x) is well defined. This is because in the update query only nodes along a path are assigned the same number, and each update query assigns a unique number. Since the two children of a node x cannot be the part of a single path from root to some node, both of them cannot have the same number as that of node x. Now, we will discuss how to handle the queries in O(h) time using the above information. 1) Distance query (Computing d(r, x)): 2) Subtree distance sum query (Computing S(x)): 3) Update query: In the first pass we remove the convert the weight of edges from 0 to 1, and update the values (e.g., T(), I()) caused by this, while in the second pass we convert the weight from 1 to 0, and values. First pass: delta := 0 curr := x while (true) { y := I(curr) // Since we are assigning a new number to curr, // the weight of the edge (curr, y) should become 1. // Note that, it is possible that y also lies in the // path from root to x, in which case the edge weight // c(curr, y) still remains zero. However, if this is // the case, the edge weight will be changed to zero // in the second pass. if (y != 1) { T(y) += sz(y) // sz(y) * c(curr, y) delta += sz(y) c(curr, y) = 1 I(curr) = 1 } T(curr) += delta if (curr == r) break curr = parent(curr) } second pass (almost the dual of first pass): delta := 0 curr := x prev := 1 while (true) { // The earlier pass ensures that all the edges // encountered in this pass have weight 1, and should // be converted to zero. if (prev != 1) { T(prev) = sz(prev) // sz(prev) * c(curr, prev) delta = size(prev) c(curr, prev) = 0 I(curr) = prev Easy to see that complexity of this operation is O(h) This means that if we have a balanced tree where h = O (lg N), then we can support all operations in O (lg N) time. Next, we consider the highly unbalanced tree, caterpillar tree, where height is O (N), and show how to handle the queries in O (lg N) time, and later we will combine the two to give an algorithm for general trees. An O (lg N) algorithm on a caterpillar tree:A caterpillar tree is a tree, which has a central path, and each node is within distance 1 from the central path. An example is shown below. Clearly, the height of a caterpillar tree could be O (N), hence the algorithm of previous section will not be efficient here. We use a variant of binary index tree to perform the three kind of queries in O (lg N) time in this tree. We use the following data structures to achieve the same. For each node x which is not part of the central path, we store the same information as in the previous section, i.e., Only information (2) and (3) is nontrivial, and in fact both of them are the same. However, for general trees they will be different. For the central path, we store the following information: 2) Segment tree GeneralizedEdgeWeightTree, storing the generalized edge weights of links. The generalized edge weight of an edge (a, b) is c(a,b) * sz(b). In the above example this segment tree will corresponds to the array 3) Binary index tree NonChainT, storing the T(x)’s of the nonchain nodes, i.e., the kth element of the array will correspond to the sum of T() values of all nonchain nodes which are the children of kth node. In the above example this binary index tree corresponds to the array 4) ZeroWtEdges, vector of nonchain nodes in decreasing order of height, which have zero weight edge with their parents. Initially, this list will be empty. Note that we can combine some of the above information into a single segment tree, but for clarity we store them separately. A binary index tree takes an array and supports querying subarray sums and updating a single value in the array. The reference shows how this can be implemented. Since we are storing the T() values of nonchain nodes in the binary index tree NonChainT, we can easily update any of them, or computing the sum of T() value of nonchain nodes whose whose parents form a subchain of the central path. A segment tree takes an array, and along with the queries supported by binary index tree, it also supports setting all values in a subarray to be zero. We will show later how to support this extra kind of query. For the time being let us assume that we have a black box data structure which performs these queries in O (lg N) time. Now lets use see how to support the three kind of queries on the caterpillar tree using these data structures. Distance query: Subtree distance sum query: Now, assume that the node x lies in the central path. We can compute c(parent(x), x) using the segment tree storing edge weights, hence if we can compute T(x) we are done. T(x) can be computed by iterating over all edges of the subtree rooted at x, and adding their generalized weights together. This means T(x) = sum of T() of all nonchain nodes whose parents are in the subchain [x, tail] + sum of generalized weights of all links in the subchain [x, tail] Here tail is the last node of the chain. The first part of the sum can be computed using the binary index tree NonChainT, storing T() values of nonchain nodes, and the second part of the sum can be computed using the segment tree GeneralizedEdgeWeightTree storing generalized weights. Both will take O (lg N) time. Update query: First pass: // Convert the weight of edges from chain nodes to nonchain nodes from 0 to 1 // These edges are stored in ZeroWtEdges in decreasing order of height while (!ZeroWtEdges.empty()) { y := ZeroWtEdges.back() z := parent(y) if (height(z) > height[x]) break // Since we are assigning a new number to all nodes in the chain [r, x], // the weight the edge (z, y) is going to be changed to 1. c(z, y) = 1 T(y) += sz(y) NonChainT.increment(z, sz(y)) ZeroWtEdges.pop_back() } Since the size of vector ZeroWtEdges can be O (N), a single query make take up to O (N lg N) time. However, the amortized time remains O (lg N). This is because each update query adds at most one element in the vector, hence after k queries there can be at most k elements. Hence, we can perform at most k pop operations on this list. Second pass: Next, we need to set the edge weights, and generalized edge weights of all links in the subchain [r, x] to zero, where x is the chain node closest to the one chosen in update query. This can be done by the special operation of our segment tree, which allows setting all values in a subarray to zero in O (lg N) time. This completes the implementation of the three kind of queries. Now let us see how to implement the segment tree which support our desired operations: Segment tree supporting subarray nulling:A segment tree is a balanced binary tree where each node stores information about a subarray. If the subarray corresponding to a node x contains the subarray corresponding to a node y, then y must be a successor of x in the segment tree. In our case each node stores the sum of the corresponding subarray. In addition to the sum of the subarray, we also store a new boolean is_null at each node, which is true if all elements in the subarray are zero, and false otherwise. Using this variable we can update the segment tree lazily in case of a subarray nulling operation. // sets the value at index idx to val (nonzero) // The variable in_null_zone is true, if the underlying tree // corresponds to a subarray whose all elements are zero. void set(tree *t, int idx, int val, bool in_null_zone) { if (!t.contains(idx)) { // We are splitting a null zone into two, hence we need // to pass the null information to children. if (in_null_zone) { t>sum = 0 t>is_null = true } return } // debunking a nullzone as a nonzero value is being put in this subarray. if (t>is_null) { in_null_zone = true t>is_null = false } if (t>is_leaf()) { t>sum = val return } set_val(t>left, idx, val, in_null_zone) set_val(t>right, idx, val, in_null_zone) t>sum = t>left>sum + t>right>sum } // Sets all the elements in the subarray [s, e] to zero void create_null_zone(tree *t, int s, int e) { if (t>is_null) return if (!t>overlaps(s, e)) return // The whole subarray becomes zero. if (t>is_contained(s, e)) { t>sum = 0 t>is_null = true return } // Partial overlap, create null zones in children, // and then merge (if possible) create_null_zone(t>left, s, e) create_null_zone(t>right, s, e) t>sum = t>left>sum + t>right>sum t>is_null = t>left>is_null && t>right>is_null } // Computes the prefix sum of the subarray [0, idx] int prefix_sum(tree *t, int idx) { if (!t>overlaps(0, idx)) return 0 if (t>is_zero) return 0 if (t>is_contained(0, idx)) return t>sum return prefix_sum(t>left, idx) + prefix_sum(t>right, idx) } General tree:We have seen that if our tree is balanced, or is a caterpillar tree, then we can handle the three kind of queries in O (lg N) time. These approaches can be combined together in such a way that the queries on the general tree can be answered in O (lg^{2} N) time. This is achieved via heavylight decomposition of the tree. The main idea in the heavy light decomposition is to partition the tree into disjoint chains such that the path from any node to root passes through at most O (lg N) chains. Since each chain can be visualized as a caterpillar tree, the problem then becomes equivalent of solving the problem on O (lg N) caterpillar tree resulting in O(lg^{2} N) complexity. As an example in the above figure if we have 4 nonsingleton chains C0, C1, C2, C3 whose corresponding trees are caterpillar trees. If we want to pick a node x in C0, and want to calculate its distance from root r, we need to traverse the chains which lie in the path from x to root, and these are {x}, C0, C1, C3. Next, we consider the part of each chain which lie on the path, and compute its weight by solving the problem on that caterpillar tree, and finally combine them together to get the path weight d(r3, x) d (x, x) + d(r0, parent(x)) + d(r1, parent(r0)) + d(r3, parent(r1)) + Note that all the distance queries are on a single caterpillar tree, and all the edge weight queries are for the edges between root of a caterpillar tree and its parent. In a similar way, the update query can be split among the chains overlapping the path from root to x. We need to perform the following updates on the caterpillar trees: For the distance sum query, we need to consider only the chain where x lies, and use the same methods as that in the case of a caterpillar tree to compute the T(x), and then S(x) using the formula mentioned above. So now the questions is how do we partition the tree into chains such that no more than O (lg N) chains lie on any path from root to some node. The splitting into chains with such property is called heavylight decomposition, and is described here. For a node x, if one of its children y has more than half of the nodes (i.e., sz(y) > 0.5 * sz(x)), then x and y are in the same chain, otherwise they belongs to different chains. Since there can be at most one such child of a node, the chain structure is well defined. Moreover, each nonchain edge (x, y) means the size of the subtree at x is more that twice the size of subtree at y. Hence in any path there can be at most O (lg N) nonchain edges, i.e., at most O (lg N) chains. As a side note, sometimes the query time on a caterpillar tree depends on the degree of the caterpillar tree as well (e.g., O (d + lg N), O(d * lg N) etc.). Most often, "binarizing" the tree as mentioned here helps reducing the time complexity to O (lg N) in such cases. However, in this problem such tricks were not required. TIME COMPLEXITY:O (lg^{2} N) per query RELATED PROBLEM:In the above problem also support the query which takes a node x and returns the average distance between two nodes in the subtree rooted at x. AUTHOR'S AND TESTER'S SOLUTIONS:Author's solution: admins, please upload.
This question is marked "community wiki".
asked 05 Dec '13, 19:16

I didn't use HeavyLight Decomposition for this problem. For this problem, I used a segment tree and lowest common ancestor (LCS) only. First, we see that the shape of the tree is not affected by any of the queries, so we precompute the size of the subtree rooted at each node, and reduce the qquery from avg to sum. Next, we consider an Oquery, where a new gangster marches from a capital to a particular node. Now, as the new gangster traverses the path to the node, it is clear that some edges will be flipped from 0 to 1 or 1 to 0. Therefore, an Oquery can be reduced to a sequence of queries of the following type (let's call it an fquery): "For a given node, flip the edge from its parent to itself, updating all nodes in the subtree rooted at it (including itself)". How do we know which edges to flip? First, we can see that the territory of each gangster will always remain a contiguous path from a node to one of its descendants, and the lowest node never changes. Therefore, for each gangster, we keep track its lowest and highest node, and for each node that is the highest or lowest node of some gangster, we keep track of the gangster number on that node. Let's first define child(a,n) to be the child of a in the path from a to n (assuming a is an ancestor of n). Let's start at the capital, and assume we are going towards the node u. Clearly there's already some gangster having the capital as territory. Let's call the lowest node of this gangster L. Then there are four possible things to happen:
It is also necessary to update the lowest and highest nodes of each gangster we pass through. It is clear that we need to support three other subqueries, (1) given a node n and one of its ancestors a, determine child(a,n), (2) given two nodes a and b, determine LCS(a,b), and (3) given two nodes a and b, determine if b is an ancestor of a, a descendant, or neither. Both queries (1) and (2) can be done in O(log N) time by storing, for each node, its height and its 2^{i}th ancestor for every i ≥ 0. Query (3) can be reduced to (2) by checking if LCS(a,b) = b, LCS(a,b) = a, or neither. Therefore, we can determine which edges to flip in O(e log N) time, where e is the number of such edges. So far, we have reduced the problem to solving a number of the following queries:
Let's take a look at the fquery and see if we can simplify it a bit. First, we know that an edge can only have a weight of 0 or 1. Second, we know that all edges are initially 1, and the initial distance of every node from the capital is its height. Now, to implement an fquery, we do the following:
These two operations, together with qqueries, can all be accomplished in O(log N) time by first doing a preorder traversal of a tree, and constructing a segment tree on the resulting traversal array. By doing a preorder traversal, we guarantee that each rooted subtree is in a contiguous subarray, and by doing lazy propagation, a segment tree can support update and sum queries in O(log N) time. (I found an amazing description here). The only problem remaining now is: How many fqueries will we perform? It is clear that a single Oquery can potentially yield N1 fqueries, and if one considers that there are Q = 100000 possible queries, it seems that there will be too many fqueries that we will need. However, let us determine how many fqueries we will need exactly. For a particular edge adjacent to the root, consider how many Oqueries pass through it, and how many don't. Let's call the former x, and the latter Qx. It is clear that we can maximize the number of fqueries on this edge by interleaving the Oqueries that pass through it and those that don't, giving 2·min(x,Qx) fqueries. Let us define F_{N}(Q) as, given N nodes and Q Oqueries, the maximum number of fqueries we will need to perform on a tree with N nodes. Consider an edge from the root to one of its children. Let x be the number of Oqueries that pass through this edge, and Qx be the number of Oqueries that don't. Then F_{N}(Q) can then be expressed as: F_{N}(Q) = max_{x,y} [F_{y}(x) + F_{Ny}(Qx) + 2·min(x,Qx)] Assuming F_{N}(1) = N, this recurrence can be solved as F_{N}(Q) = O(N + Q log Q) (the choice that maximizes F_{N}(Q) is always x = Q/2). Therefore, we see there aren't that many fqueries after all, and the overall complexity of this approach is: O(N log N + Q log Q log N) time and O(N log N) space. answered 06 Dec '13, 14:44

Very Well Explained! (y)
@admin: Can you upload the author's and tester's solutions? (like in the case of all the other editorials)