QUERY - Editorial

pragrame · February 16, 2013, 5:37pm

Problem Link:

Practice

Contest

Difficulty:

Hard

Pre-requisites:

Heavy-light Decomposition, Persistent data structures

Problem:

You are given a tree of N vertices. Each vertex is initialized to value 0. Further you are given 3 kinds of operations:

c X Y A B - the “Change” operation: Along the path from X to Y increment the values by A, A+B, A+2B, etc.

q X Y - the “Query” operation: Find the sum of values along the path from X to Y

l X - the “Rollback” operation: Return the tree to the state it was after the X’th Change query.

The input is given in a manner that we require an online solution for the problem (for exact details, refer to original problem statement).

Explanation:

Note: The solution describes using segment trees, heavy light decomposition, and persistence over a tree. To avoid confusion in terminology, “node” refers to the data structure component (segment tree or persistence), whereas “vertex” refers to the vertex of the Tree given in the Problem.

We develop the solution by analyzing the problem under various special cases, as follows:

The Tree is a chain, there are no rollback queries
You have a chain, but there are rollback queries
You have a tree, there are no rollback queries
Overall solution.

Tree is a chain, there are no rollbacks

Here, we have a segment - lets use a segment tree. Now, we wish to update the values of vertices from X to Y. Suppose we have two such operations on the same vertices: X Y A B followed by X Y C D. Then the values will be (going along the path from X to Y)

A
A+B
A+2B

etc.

after the first operation, and

A + C
A+B + C+D
A+2B + C+2D

etc.

after the second operation.

Clearly, this is equivalent to the single operation X Y A+C B+D.

Thus, let us store in our segment tree the pairs (A,B) associated with the operation. Now, when we get to a segtree-node which is completely contained in our X-Y path, we just update the A, B value of that node.

Further, while querying, we would like to return an answer when our query-node is completely contained in our required segment. Thus, we also need to store a field “sum” in our segtree which basically stores the sum of the left subtree + right subtree.

Pseudocode for this is as follows:


update(node cur, 	//current node in segtree we wish to update
	int X, 		//X-val of update-segment wrt current node
	int Y,		//Y-val of update-segment wrt current node
	int A,		//A-val of operation
	int B)		//B-val of operation
	if(cur.left == X && cur.right == Y)
		cur.A += A, cur.B += B;
		return;
	mid = (cur.left + cur.right)/2;
	if(X <= min(Y, mid))
		update(ret.left, X, min(Y, mid), A, B)
	if(max(X, mid+1) <= Y)
		update(ret.right, max(X, mid+1), Y, A + max(0, mid-X+1) * B, B);
	cur.sum = findSum(cur.left) + findSum(cur.right);

findSum(node cur)
	n = cur.right - cur.left + 1;	//#elements in node
	a = cur.A;
	b = cur.B;
	return cur.sum + calc(a, b, n);

calc(a, b, n)	//returns sum of a, a+b, a+2b, a+(n-1)b
	return (n * (2 * a + (n-1) * b))/2;

query(node cur, int X, int Y, int accumA, int accumB)
// similar to update: here accumA and accumB store the sum of A-values and B-values along the path from root to the node, as this is required in finding the overall value of a node at a depth.
	if(cur.left == X && cur.right == Y)
		return cur.sum + calc(accumA, accumB, (Y-X)+1);
	ret = 0
	mid = (X+Y)/2;
	if (X <= min(mid, Y))
		ret += query(cur.left, X, min(Y, mid), accumA+cur.left.A, accumB + cur.left.B)
	if (max(mid+1, X) <= Y)
		ret += query(cur.right, max(mid+1, X), Y, accumA + cur.right.A + accumB * (size-of-left-subtree), accumB + cur.right.B)
	return ret;

This can be found in Setter’s Solution (lines 24 - 54), if you ignore the persistence that is introduced in the modify() function.

You have a chain, but there are rollback queries

Introducing Persistence

Here, we need to keep past information as well. The term used for this is Persistence. If we were to generalize the segment tree approach, we need to implement a segment tree that is capable of going back in time as well as forward in time.

To illustrate persistence, let us first take the example of a singly linked list. Imagine we have the list


A -> B -> C -> E -> F

And we wish to insert the element D between C and E. Also, we need to keep track of both “versions” of the list. That is, we require our list to look like


A -> B -> C -> D -> E -> F

while keeping track of the original. We could make an entirely new copy of the entire list, but this will take O(N) memory and time. The thing to notice is, from E on till the end of the list, it remains completely the same. Hence, we “clone” the nodes A, B, and C, and insert D after the clones. We finally associate the cloned root A with the new version.

Thus, our data structure looks somewhat like this now:


Version0: A -> B -> C -> E -> F
Version1: A' -> B' -> C' -> D -> E -> F

Note that if we had a doubly linked list, then we would need to clone the entire list in order to implement Persistence.

Applying persistence

Now, how does this apply to our generalization? We can imagine our segment tree as a binary tree with links to left and right children. In this setting, what would a modification to the tree look like?

It would just be a path starting from the root upto a particular node! What we need to do here, is clone just the path from the root to the node, and associate the new root with the new version. Here, when the tree has N nodes, a path of length atmost log(N) is being cloned. If you compare with the linked list example, over there, in the worst case we could clone a path of size O(N). Thus, there is a huge improvement in the case of the segment tree.

So, if we were to have a global array of Version-roots, and perform our updation of the tree with cloning, we get exactly what we desire. Refer to Setter’s code : lines 42 - 62 for the description of this.

You have a tree, and no rollback queries

Heavy-light Decomposition of Tree (HLDoT)

The heavy-light decomposition is used to break up a Tree into s set of disjoint paths, with certain useful properties. First, root the tree. Then, for each vertex x, we find the child of x having the largest subtree. Lets call that vertex y. Then the edge x-y is a heavy edge, and all other x-child_vertex edges are light edges.

The most important property of this is, from any vertex x, the path from x to the root goes through at most logN different light-edges. This is because, on this path, whenever you take a light edge, you are atleast doubling the size of the subtree rooted at the new vertex. Hence you cannot perform this “doubling” effect more than logN times.

A number of problems that require you to find some property of the path between two arbitrary nodes of a tree have heavy-light decomposition used. For example, DGCD is one such problem, which asks you to find the GCD of the numbers along the path from X to Y.

If you can solve the problem for a chain using a segment tree, then there is a very good chance that you can solve the problem for a tree using HLDoT. Indeed, if you make segment trees over the heavy edges, then the answer for your path X-Y can be broken up into two paths from X to LCA(X, Y) and from Y to LCA(X, Y). Then, using that you make only logN shifts from one heavy-chain to another, you are actually making only log(N) segment-tree queries.

Applying HLDoT here

Let us perform Heavy Light Decomposition of the tree here. We make chains consisting only of heavy edges. We also need to find LCA efficiently. This can be done in O(N log N) time by storing information up[i] = the ancestor of x which is at a height of 2^i above x. Clearly, up[i] = up[up[i-1]][i-1] (take a 2^(i-1) upward jump from the 2^(i-1)'th ancestor of x). Then,


LCA(x, y):
if(x is ancestor of y) return x;
for(i = logN; i >= 0; i--) 
	if (up[x][i] is not an ancestor of y) x = up[x][i];
return up[x][0];

Now, given an update query, from X to Y, first find L = LCA(X, Y). Then, update path from X to L and from L to Y.

This is accomplished in the Setter’s code lines 195-220. Pseudocode follows. Let chain[vertex] = an identifier for the particular segment tree’s root that we require.


change(L, x, y, a, b):	// Perform the operation X Y A B, where L = LCA(X, Y)
dist = depth[x] + depth[y] - 2depth[L] + 1
lift(x, L, a, b);	// update path x to L with parameters (a, b)
if(y is not L)
	find pL = child of L that is ancestor of y
	//update path from y to pL with parameters (a + b * (dist-1), -b)
	lift(y, pL, a + b * (dist-1), -b)

lift(low, high, a, b):
if(chain[low] = chain[high])
	Modify the chain[low]'th segtree along the path from low to high as required
else
	Let H = Head of the chain[low]'th segtree (i.e. the one nearest to the root)
	Let n = number of nodes on path from low to H
	Modify the chain[low]'th segtree along the path from low to its Head as required
	lift (parent of Head, high, a + (n-1) * b, b)

Overall Solution

The overall solution merges persistence with Heavy Light Decomposition. At the high level, it goes as follows:

Perform heavy-light decomposition to give you information regarding ancestry-relation between nodes, LCA, depth, and mapping vertices to chain-numbers
For a change operation between X and Y, for every chain along the path from X to Y, perform a persistent-change to the corresponding segment trees. For each segment tree, you have an array of the root-nodes that map versions to roots.
For every query operation between X and Y, do the same as step 2, except you need to accumulate queries over various segment trees, and don’t perform any modifications
For every rollback operation, set a global variable (that denotes your version number) to the required version.

Complexity Analysis:

Memory: Each update operation on a segment tree takes atmost O(logN) memory. Each update operation on the tree affects atmost O(logN) chains. Hence, memory complexity is O(N log^2 N).

Time: Updates on each segment tree take O(logN) time. There are atmost O(logN) segment trees to update on an Update Operation. Hence, O(log^2N) per update operation.
Queries behave similarly: O(log^2N) for queries as well.
Rollback operation: O(1) time to update global version-number.

Appendix: Solution for offline problem

The requirement of lastans in the input format ensured that offline solutions would not work here. There is a offline solution to the problem which does not use persistence.

Imagine versions as nodes of a graph. Now, when you have an update operation, that amounts to creating a new version-child of the current node. If you had no rollbacks, then your graph would be a chain. Now that there are rollbacks, you may branch out from an earlier version. In this case you have a tree.

Note that, if you went from a version to its child using the operation X Y A B, then you can go back from the child to its parent using the operation X Y -A -B. Thus, first you build this version-tree, and then you traverse it using a DFS. When you are in a version-node n, then solve all queries that pertain to that node before moving on.

Setter’s Solution:

Can be found here

Tester’s Solution:

Can be found here

anton_lunyov · February 16, 2013, 9:26pm

Complexity analysis is bad.
The actual time and memory for the described solution is O(N log N) (for both).
The reason is that for at most 3 (or 4 in case of bad choice) heavy-light paths among O(log N) paths we do non-trivial query or update.
For all other paths the update/query concerns the whole path and thus performed in O(1) time and add O(1) memory in case of update.
UPD. This is wrong
See @aircube comment below.

But the method of making persistent the collection of segment trees seems wrong.
How should we choose the proper root at the segment tree?
I simply use binary search to find the root with the largest version that <= current version.
I use this method with array of roots in each segment tree but always got WA.
Then I wrote stress-test that simply compare outputs of my solution with different roots in the tree chosen and the following test break my idea completely:

The output when root of tree is 1 and 3 are different for the query.
Note that here we do a change that concerns almost all heavy paths, then do a rollback to zero version and then again do a change that concerns two consecutive vertexes 1 and 3. So currently we are in the version 2, only few segments trees has the root of this version and several other segment trees have roots of versions 0 or 0 and 1 only. Now when we ask query at version 2 and this query involves segment tree that has roots of version 0 and 1 only the correct version to choose would be 0 but in my approach I choose version 1 as the largest version <= 2.

I scratched the head for a while to handle this properly and my idea was to construct a balanced binary search tree where each node corresponds to some segment tree with key equal to the up-vertexes of heavy-light paths.

Then we indeed can have array of roots of this mega-tree for different versions.
When update is performed we do update similarly to the segment tree copying paths to the required nodes. To perform all properly I store all segment trees to update/query in a sorted array (by up-vertex of heavy-light path) and then do a one query to the mega-tree with this array in order to visit each node of the mega-tree only once.

Now indeed I have O(log^2 N) time and memory since union of paths to chosen nodes can be >= C log^2 N is the worst case (>= C log N segment trees to update and they all lie at leafs of mega-tree).

It seems that in tester’s solution something similar is used.
At least I see the history tree is also binary tree there.
I see that in problem-setter’s solution something mostly corresponding to editorial is made.
But there are several other arrays that handle history is used,
but this is not described in the editorial.
However I also see the persistent array there that is made like binary tree.
And also some random solutions from the contest also use the same idea for persistent array.
I am curious whether simpler method exists that uses less time and memory.

xcwgf666 · February 16, 2013, 11:36pm

Thanks for your feedback!

For all other paths the update/query concerns the whole path and thus performed in O(1) time and add O(1) memory in case of update.

I think, it’s not correct because you are actually performing updating of some prefix or suffix of the path.

How should we choose the proper root at the segment tree?

I have used persistent array, it can be implemented as an ordinary segment tree. The “Appendix” part says that versions form the tree. I have used this property and stored persistent array in every vertex of this “tree of versions”. Such an approach allows to get the latest version of every chain. It doesn’t seem to me as the hardest part of this problem. But you are right, maybe it would be better to explain it in more detail.

I can’t say that I’ve understood other people’s solutions completely, so it would be nice to read other approaches to this problem. But I think that the “heart” of the solution is the same in all the AC submissions.

And thanks for explaining your idea

voanhkhach · March 2, 2013, 12:52pm

It’s really a nice editorial! Just being curious, do we have any other decomposition on tree having the same efficiency as HLDoT?

anton_lunyov · February 16, 2013, 11:42pm

As I said only for 3 heavy-lights paths (that contains X, Y and LCA(X,Y)) updates may not touch the whole path. Each other part of the path from X to Y is a complete heavy-light path, so update will touch the whole path.

In my solution I use bound about 8 millions for the total of nodes in all segment trees.
It exactly corresponds to the bound 4 * log N * N.
So I think my analysis is correct.

The log^2 N time and memory comes only from persistent array implementation, which is exactly my “mega-tree”
Hence I am curious whether it can be made more efficient.

xcwgf666 · February 17, 2013, 12:07am

Oh, I’ve always thought that it requires O(log^2N) complexity

Google says, that there exist some persistent array implementations with O(log log m) time complexity, where m is the number of modifications. So, there exists some (theoretically) faster solution. But I’m not sure that it will be more efficient at this problem.

aircube · February 17, 2013, 12:40am

"The reason is that for at most 3 (or 4 in case of bad choice) heavy-light paths among O(log N) paths we do non-trivial query or update. "

I think it`s wrong. Make tree with following recursive procedure.
Make(N) - make tree with N vertices.

Make(N) :

The right son of root is the chain of (N+1)/2 vertices.
The left son of root is Make(N - (N+1)/2 - 1)

Now, if we start from leftmost vertex going to root we do O(log(N)) non-trivial queries.

anton_lunyov · February 17, 2013, 12:52am

Cool!
I never thought of such possibility.
And what is interesting that for each heavy-light path only the first vertex is updated so we add about log K nodes to the segment tree corresponding to each heavy-light path (K is its length). So the change query here add about log^2 N / 2 new nodes.

Then it only means that test data are weak

abcdexter24 · September 15, 2014, 11:44pm

thanks @anton_lunyov