BTREE-Editorial

djdolls · October 15, 2014, 9:01pm

PROBLEM LINK:

Practice
Contest

Author: Tom Chen
Tester: Jingbo Shang
Editorialist: Ajay K. Verma

DIFFICULTY:

Hard

PREREQUISITES:

Tree Partition, Lowest Common Ancestor, Persistent Segment Tree

PROBLEM:

We are given a tree with N nodes. We need to process the queries of the following type: for a given set S = {v1, v2, …vk} of k nodes, and a set of k integers {d1, d2, …, dk}, find the number of nodes in the tree which are within di distance from the node vi, for some i.

EXPLANATION:

We will first solve an easier version of the problem where all queries have k = 1, and then use it to solve the case for general k.

Queries with k = 1:

In this case the queries will be of the form: given a node v and an integer d, find the number of nodes which are within d distance from v. As a notation we will call d as range of node v.

Balanced Tree:

Let us further simplify the problem, and assume that the given tree is a rooted balanced tree, i.e., for each node the subtrees rooted at its child nodes are similar in size.

We preprocess the tree, and compute the depth (distance from root node) of all nodes in the tree. We also associate a special data structure (which we name as “depth profile” of the subtree) at each node, the pointer of this structure is stored at all nodes in the subtree rooted at this node.

Suppose, we want to create the depth profile of the subtree rooted at node v. Let us say that node v has m children namely u1, u2, …, um. The depth profile of the subtree contains (1 + m) vectors. The first vector A[] stores the global information of the tree, while the remaining m vectors B[1…m] store the information about the subtree rooted at children nodes. More specifically
A[i] = number of nodes in the subtree rooted at v, which are within distance i from v,
B[i][j] = number of nodes in the subtree rooted at ui, which are within distance j from v.

depth_profile(v) = {A, B[1], B[2], …, B[m]}

Each node in the subtree rooted at v, stores the following information:

pointer to depth_profile(v),
distance of this node from v,
the index of the child node ui, whose subtree this node belongs too, for the root node v, this index is 0.

Since the tree is balanced, each node will store the information about O (lg N) depth profiles, corresponding to the nodes which are in the path from root to this node.

Now, let us see how to use this information to answer the queries. Suppose for a given node w, we want to find the number of nodes within distance d from it. We look at all the depth profiles whose pointer is stored at node w.

Let us say we are looking at the depth profile of node v, which is an ancestor of w and is at a distance r from w, moreover w lies on the subtree of i-th child ui of v. Note that, all the nodes in the subtrees rooted at {u1, u2, …, um} / {ui}, whose distance from v is smaller than (d - r), will be with d distance from w. The number of such nodes will be (A[d - r] - B[i][d - r]). So now we know the number of nodes within distance d from w, which are in the subtree rooted at v, but not in the subtree rooted at ui. If we consider the depth profile of all nodes in the path from root to w, and add the computed value, we will get the number of nodes within distance d from w.

The time taken to answer this query is O (number of depth profiles we looked at). Since the tree is balances, this will be O (lg N).

Lopsided Tree:

In case of a lopsided tree, the height of the tree could be O (N), and hence the query time also could be as large.

However, we can manipulate the root of the tree, so that we only need to store the pointer of O (lg N) depth profiles at a node. We pick the root node v in such a way that t(v) = max (size(u1), size(u2), …, size(um)) is minimum, which will be the most balanced partition at this level. Also, for such v, t(v) <= m/2, where m is the number of nodes in the subtree rooted at v. Now for the depth profile of the subtree rooted at node ui, we can again manipulate the root, i.e., the root of this subtree does not need to be ui, but we will pick the one, which has the most balanced children subtrees. Since the depth profile of v is completely independent than the depth profile of ui, picking a different root for the subtree rooted at ui will not change anything.

This means that we can modify our data structure in a way that the queries for lopsided tree can also be answered in O (lg N) time.

General Queries (k >= 1):

If the given subset has k > 1 nodes, then we cannot just handle each of the k nodes independently, and add the values together, as some nodes of the tree might be counted several times.

For this reason, we partition the tree into k subtrees, each “centered” at one of the k nodes in the give subset. The partition will have the property that if a node lies in the partition centered at vi, and is in the range of one of the nodes of the subset (say vj), then it must also be in the range of vi. The property is important because it says that we can handle the partitions independently, and for each partition we only need to find the nodes which are reachable in the specified distance from the center of the partition.

Tree Partition:

First we create an auxiliary tree with O (k) nodes. This tree contains all the nodes in the given subset S as well as lowest common ancestors of some of the nodes of the subset. In the lack of a picture, lets try to understand this with the help of an example.

Suppose the given tree (rooted at 1, for simplicity) has the following edges:
(1, 2), (1, 3)
(2, 4), (2, 5), (2, 6),
(3, 7), (3, 8),
(6, 9), (6, 10)

If the chosen subset S is {1, 4, 6, 7}, then the auxiliary tree will have the nodes {1, 2, 4, 6, 7}, and the following edges:
(1, 2), (1, 7),
(2, 4), (2, 6).

Note that (1, 7) is an edge which was not present in the original tree. In the general case the edges in the auxiliary tree correspond to the “monotonic path” (a path between a node and its ancestor) in the original tree.

The auxiliary tree can be created easily using a stack. We already have the nodes of the tree in depth first traversal order, as well as a data structure which can answer the lowest common ancestor queries efficiently (you can refer to this editorial to know more about this data structure). In order to compute the additional nodes of the auxiliary tree, we need to traverse the nodes of the subset in the tree depth first traversal order, and add the lowest common ancestor of any two consecutive nodes in the subset.

In the above example the nodes in depth traversal order are (1, 4, 6, 7), we need to add lca(1, 4) = 1, lca(4, 6) = 2, lca(6, 7) = 1, in the auxiliary tree. Hence, the auxiliary tree will have {1, 2, 4, 6, 7} as nodes.

In order to create the edges of the auxiliary tree, we traverse its nodes in the depth first traversal order (of the the original tree), while maintaining a stack of visited nodes. Since we are visiting nodes in the depth first traversal order, if an already visited node u is not an ancestor of current node v, it will never be ancestor of any node visited in future, so we can remove it from the stack. On the other hand, if u is an ancestor of v, then we create an edge (u, v). The following snippet shows the pseudocode of the same.

void CreateEdges(vector &nodes) {
	stack st;
	for (int v : nodes) {
		while (!st.empty()) {
			int u = st.back();
			if (u is an ancestor of v)
			{
				addEdge(u, v);
				break;
			}
			st.pop();
		}
		st.push(v);
	}
}

In our example, the process goes as follows:
visited nodes = {}
stack = {}

visited nodes = {1, 2}
stack = {1, 2}
edge added = {(1, 2)}

visited nodes = {1, 2, 4}
stack = {1, 2, 4}
edge added = {(1, 2), (2, 4)}

visited nodes = {1, 2, 4, 6}
stack = {1, 2, 6}
edge added = {(1, 2), (2, 4), (2, 6)}

visited nodes = {1, 2, 4, 6, 7}
stack = {1, 7}
edge added = {(1, 2), (2, 4), (2, 6), (1, 7)}

Now, we have the auxiliary tree, we are ready to partition the tree. We label all the nodes in the auxiliary tree with one of the nodes in the given subset S, such that if label of node w is v, that means (range (v) - dist(v, w)) is largest among all nodes v in S. This implies that if w is within the range of any node in the S, then it must also be in the range of v.

Such a labeling can be obtained easily by running Dijkstra’s algorithm on the auxiliary tree, as shown in the following snippet.

void Label() {
	max_heap H;
	for (int v : S) {
		H.insert(make_pair(range[v], v));
	}

	while (!H.empty()) {
		(d, u) = H.front();
		H.pop();

		for (v : neighbors(u) in the auxiliary tree) {
			r = dist(u, v);
			if (range[v] >= (d - r)) continue;
				
			range[v] = d - r;
			label[v] = u;
			H.insert(make_pair(d - r, u));
		}
	}
}

Now we know the range and label of all nodes in the auxiliary tree. However, the edges of this tree correspond to monotonic paths in the original tree. Hence, if the two end-points of an edge have different label, we need to find the exact partition of the edge. For example, if we have an auxiliary edge between nodes u and v (u is an ancestor of v), and the two nodes have labels x and y, then there will a node z in the path from u to v, such that all nodes in the path (u → z) / {z} will have label x, while all nodes in the path (z → v) will have label y.

Finding z is quite easy:
range[u] - dist(u, z) < range[v] - dist(v, z)
range[u] - range[v] < dist(u, z) - dist(v, z)
range[u] - range[v] < depth(z) - depth(u) - (depth(v) - depth(z))
range[u] - range[v] +depth(u) + depth(v) < 2 * depth(z)
(range[u] - range[v] +depth(u) + depth(v) + 1) / 2 = depth(z)
This uniquely identifies z.

In other words, all the nodes in the subtree rooted at u will come in the partition centered at x, except the ones which are in the subtree rooted at z, which will come in the partition centered at y.

If u is the root of the auxiliary tree, then not only the nodes in the subtree rooted at u, will be in x’s partition, but also the ones which are outside the subtree.

Handling the Query:

After the partition, we go through the nodes of the auxiliary tree. For each node u, we already know its label and range. Next, we compute the number of nodes of the original tree, which are within range of this node, and belong to the same partition as this node. We show the steps involved in this process below.

Consider the auxiliary edge (parent[u], u), partition this edge into two sub edges (parent[u], z) and (z, u) as explained in the previous section. Compute the number of nodes in the subtree rooted at z, which are within range[u] distance from u, and add them to the answer. If node u has no parent, then consider z to be the root of the original tree.
Consider each auxiliary edge (u, v), partition the edge into (u, z) and (z, v). Compute all nodes in the subtree rooted at z, which are within range[u] distance from u, and subtract this from the answer.

Since, we are subtracting the nodes, which belong to different partition, a node is counted only for one partition. Hence, this will give the right answer.

However, in the above computation, we needed to handle several queries of the form: Number of nodes in the subtree rooted at z, which are within d distance from a node u. Next, we show how to handle these queries.

There could be three possible cases:

neither u is ancestor of z, nor z is ancestor of u: In this case the number of nodes will the same as the number of nodes which are at a distance (d - dist(u, z)) from z.
u is ancestor of z: In this case also the number of nodes will be the same as the number of nodes which are at a distance (d - dist(u, z)) from z.
z is ancestor of u: In this case we first compute the number of nodes at distance d from u (not necessarily in the subtree rooted at z). This can be computed as explained in the section handling the case of (k = 1). Since not all these nodes will be in the subtree rooted at z, we need to subtract the ones which are outside the subtree. Number of these nodes will be (number of nodes at distance (d - dist(u, z)) from z - number of nodes in the subtree rooted at z, which are at distance (d - dist(u, z)) from z).

This means, we only need to handle the queries of the form: find the number of nodes in the subtree rooted at z, which are at a distance d1 from z, i.e., the nodes in the subtree whose depth does not exceed (depth(z) + d1).

If we create a list of all nodes in the depth first order traversal, the nodes in a subtree correspond to an interval in the list. Hence, the above query translates to finding the number of element in a subarray, which are smaller than a given value. This is a classical data structure problem, which can be solved using Persistent Segment Tree in O (lg N) time.

Hence, the queries can be answered in O (k lg N) time.

Time Complexity:

Preprocessing time: O (N lg N)
Query time: O (k lg N)

AUTHOR’S AND TESTER’S SOLUTIONS:

Author’s solution will be put up soon.
Tester’s solution will be put up soon.

whipppedcream · June 24, 2017, 6:54pm

Where are the solutions? T^T