PROBLEM LINK:Author: Praveen Dhinwa PREREQUISITES:Greedy algorithms, rooted trees, dynamic programming PROBLEM:(Note: this is a translation of the story in terms of rooted trees) You are given a rooted tree of $N$ nodes with the root node being $1$. You are given an array $A$ having $K$ distinct integers, each denoting a level of the tree. For every element $A_i$ of the array, you must delete a node from level $A_i$. However, when you delete a node, all nodes in its subtree are automatically removed as well. You may choose which order to perform the deletion. What is the minimum number of nodes that must get deleted? QUICK EXPLANATION:It is clear that we should delete nodes in decreasing level. Now, let $F(n)$ be the answer to the problem considering only the values $A_i$ having $A_i \ge n$. Then $F(n)$ can be computed recursively as follows:
This only takes $O(N)$ time overall because each node appears in only one level and thus checked only once during the computation of the $F(n)$s. EXPLANATION:The first thing that may come to mind is to take the smallestvalue in the array $A$, and take the longest path that reaches all levels whose numbers are in $A$, in the hopes of minimizing the number of subtrees cut, and therefore possibly the number of nodes cut. However, this greedy selection doesn't always work, as shown by the sample input! Other greedy solutions that comes no mind are: take the smallest/largest sized subtree, or take the smallest/largestheight subtree, or something else. But one can probably find some counterexamples to those strategies. We need to find a solution that is convincingly correct. Order of deletionOne of the first things we can think about is the order in which we process the levels $A_i$. Intuitively, we should process them in decreasing order, because if we try to delete from a lower level first, we might have to remove another tree from a higher level when we don't have to. An example is the following tree:
with $A = [2,3]$. Let's say we delete a node from level $2$ first, and without loss of generality we delete B. Now we have to delete some node from level $3$, and our only remaining choice is E. So we have removed three nodes, but we could have removed only two by deleting, say D, first, and then B! Okay, so our intuition tells us that we should delete nodes in decreasing order of levels. Let's try to prove this a bit more rigorously. Suppose we have an optimal solution for the problem whose deletions are not in decreasing order of level. Let's say the order of levels is $A_{i_1}, A_{i_2}, \ldots, A_{i_K}$. Since this is not a decreasing sequence, there thus exists a position such that $A_{i_j} < A_{i_{j+1}}$ for some $j$. Let $X$ and $Y$ be the nodes deleted at $A_{i_j}$ and $A_{i_{j+1}}$, respectively. If $Y$ is in the subtree rooted at $X$, then it would have been removed when $X$ was deleted, so we could not have been able to choose $Y$ in the first step. Therefore, $Y$ is not in the subtree rooted at $X$. Also, obviously $X$ is not in the subtree rooted at $Y$ because $X$ has a lower level than $Y$. This means that the order of deleting $X$ and $Y$ doesn't matter, and we can swap the order in which $X$ and $Y$ are deleted while still maintaining optimality. We can do this for all other positions $A_{i_j} < A_{i_{j+1}}$ until there are none. Since at each step we are increasing the inversion number of the array, we are sure that this process terminates in a finite number of steps (because the inversion count of the array is bounded by $K(K1)/2$), and the end result is an optimal solution with the nodes deleted in decreasing order of level, which is what we wanted to show! Which nodes to deleteNow that we know that we can delete the nodes in decreasing order of level, we can now find the correct choice of a node to delete from each level in $A$. Now, we assume that $A$ is reverse sorted, so $A_1 > A_2 > A_3 > \ldots > A_K$. Let $f(k)$ be the optimal solution if we consider only the first $k$ elements of $A$ (for $0 \le k \le K$). Obviously the answer to the problem is $f(K)$. $f(0)$ is easy to compute: it is simply $0$, because we don't have to delete any node! So let's try to solve something else. Suppose we want to compute $f(k)$ for $k > 0$. As we have seen earlier, it seems not wise to make a greedy choice, so we'll check all subtrees at level $A_k$ instead! We'll look at a particular node $X$ (at level $A_k$) and see what will happen if we delete that node. If we delete $X$, the next choice we make is the node rooted at $A_{k1}$ (remember that this node is deleted before $X$). Suppose the highest level of any node under $X$ is $L$. Now, if $A_{k1} \le L$, then we can just select any node under $X$ of level $A_{k1}$, because $X$ will be deleted anyway! This gives us a choice of node at the level $A_{k1}$ for free, and in fact for all $A_{k'}$ such that $A_{k'} \le L$. Therefore, the next level we have to make a selection for is the smallest $A_{k'}$ such that $A_{k'} > L$. But in this case, the answer is simply $f(k')$ by definition! Therefore, we have shown that if we delete $X$, then the number of nodes we will delete in total is $\text{size}(X) + f(k')$, where $\text{size}(X)$ is the number of nodes under $X$ (including $X$ itself), and $k'$ is maximal such that $A_{k'} > L$ (or $k' = 0$ if such a $k'$ doesn't exist). This gives us a recurrence for $f(k)$: $$f(k) = \min_{\text{level}(X) = A_k} \text{size}(X) + f(\min\{k': A_{k'} > L(X)\})$$ (where $L(X)$ is the highest level of any node under $X$, which is simply $\text{level}(X) + \text{height}(X)$) Using dynamic programming to store the values of $f(k)$ and computing it using this recurrence, we now have an $O(N + K \log K) = O(N \log N)$ solution for this problem! (the $\log N$ part comes from binary search to compute $k'$) Removing the binary searchWe can actually remove the binary search problem above by altering the solution by a little bit. Let us define $F(n)$ as the optimal solution if we only consider the elements of $A$ such that $A_i \ge n$. Then we have the following observations:
Therefore, we can construct the array $F$ instead, from $F(L+1)$ to $F(1)$, we can compute the answer as $F(1)$, in $O(N)$ time :) Time Complexity:$O(N)$ AUTHOR'S AND TESTER'S SOLUTIONS:Will be posted soon
This question is marked "community wiki".
asked 22 Jun '15, 21:03
