STETSKLX - Editorial

kevinsogo · August 16, 2015, 1:07pm

PROBLEM LINK:

Contest
Practice

Author: Antoniuk Vasyl
Tester: Hiroto Sekido
Editorialist: Kevin Atienza

DIFFICULTY:

Medium-Hard

PREREQUISITES:

Binary search, unrooted trees, sliding range minimum query, divide and conquer

PROBLEM:

Given a weighted unrooted tree of N elements, and two integers L and R, find the minimum Chef length among all simple paths whose number of edges is between L and R, inclusive. If there is no such path, output -1.

The Chef length of a path is the “median” of the list of weights of all edges in the path. If the path has an even length, the “median” is defined as the larger of the two middle elements.

QUICK EXPLANATION:

Binary search on the answer V. Thus, for a given V, we need to figure out whether there is a path of length in [L,R] whose Chef length is \le V. In the following, V is fixed.

Let’s call an edge good if its weight is \le V. Call a path good if its Chef length is \le V. Thus a path is good if and only if the number of good edges is more than half the path’s length. The following algorithm computes whether there is a good path or not.

Root the tree at an arbitrary node, and for each node i and each length l \in [1,R], define G(i,l) as the maximum number of good edges of any downward path starting from \text{parent}(i) and with length l. Define it as -\infty if such a path (or \text{parent}(i)) doesn’t exist. Also, define H(i,l) = 2G(i,l) - l. There are are 2NR such values, and all G(i,l) and H(i,l) can all be computed in O(NR) time (using dynamic programming).

If there exists a pair (i,l) with L \le l \le R and H(i,l) > 0, then return true. Otherwise, try to find two nodes i and j and two lengths a and b such that \text{parent}(i) = \text{parent}(j), H(i,a) + H(j,b) > 0 and 1 \le a, b \le R and L \le a + b \le R. If such values exist, return true. Otherwise, return false.

The last part could take O(N^2 R^2) or O(NR^2) time when implemented naïvely, but it can be computed in O(NR) time using a few tricks, for example the usage of sliding range maximum query (in at least one algorithm).

EXPLANATION:

Binary search

When finding the minimum of something, one standard approach is to binary search the answer. In more detail, suppose \phi is a predicate, and suppose we want to find the minimum V satisfying \phi(V), and suppose that for any given V we can calculate (the truth-value of) \phi(V) in time, say, O(T). Then the following approach tries to calculate the minimum V such that \phi(V) is true:

def minimize(phi):
    Lo = some (small) value V
    Hi = some (large) value V

    if phi(Lo) is true or phi(Hi) is false:
        return [failure]

    while Hi - Lo > 1:
        Md = (Lo + Hi) / 2
        if phi(Md):
            Hi = Md
        else:
            Lo = Md

    return Hi

It’s easy to see that the running time is O(T \log (Hi - Lo)), thus this approach only adds a “log factor” in the running time (which is small assuming Hi - Lo is manageable).

However, this algorithm only works if the following conditions are satisfied:

\phi(V) is monotonic, i.e. if \phi(V) is true, then \phi(V+1) is also true.
We can find values Lo and Hi where \phi(Lo) is false and \phi(Hi) is true. There are ways to systematically find such values (if they exist), but thankfully in our current problem we can easily find such values Lo and Hi, as will be described later.

This is a nice approach, because we’re reducing the problem of minimizing V that satisfies \phi(V), into a binary question: “given V, is \phi(V) true?”, while only adding a small running time factor.

Thus, we’ll try to answer the following question: Given V, is there a path of length in [L,R] whose Chef length is \le V? It’s easy to see that this predicate is monotonic, and that the minimum value satisfying this is the answer to the problem. For this problem, we can use Lo = 0 and Hi = \max c. Obviously the predicate is false on Lo (0 can’t be a median because all edges are positive), but if the predicate is false on Hi, then this can only mean there is no path of length in [L,R], so we output -1.

Is there a path?

Let’s fix a V and try to check whether there is a path of length in [L,R] whose Chef length is \le V.

Chef length

Let’s first go more in-depth on what the Chef length looks like.

Let’s call an edge good if its weight is \le V. Call a path good if its Chef length is \le V (thus we are checking whether there exists a good path in the tree with length in [L,R]). It’s easy to see that a path is good if and only if the number of good edges is more than half the path’s length. (we invite the reader to verify. Note that there are two cases, depending on the parity of the path length)

Next, let’s suppose we have a path of length l and has g good edges (0 \le g \le l). Then this path is good if and only if \frac{g}{l} > \frac{1}{2}. This is equivalent to:

\begin{aligned} \frac{g}{l} &> \frac{1}{2} \\\ 2g &> l \\\ 2g - l &> 0 \end{aligned}

This suggests that the expression 2g - l is important for any given path. Let’s call this the path’s goodness index. The goodness index has the following useful property: The goodness index of the concatenation two edge-disjoint simple paths is equal to the sum of the goodness indices of the two paths (we invite the reader to verify!).

Unrooted tree

When solving a problem related to unrooted trees, one usual strategy is to root the tree. This helps one to “orient” or “organize” the tree in some way. There are multiple choices for the root, each with its own advantages and disadvantages. In the current problem, we’ll try to just root the tree arbitrarily and see whether we can find a fast solution (because it’s cumbersome to find special nodes in the tree). So in the following, let’s assume that the tree is rooted at 1.

Once the tree is rooted, we now see what the “simple paths” look like. Specifically, there are two kinds of simple paths:

A pure downward path, which is a path between two nodes, where one is an ancestor of the other.
An up-down path, which is a path between two nodes, where neither is an ancestor of the other.

Furthermore, one can observe that a up-down path is simply the concatenation of two pure downward paths. Therefore, let’s define the following function: g(i,l), which is the maximum number of good edges of any downward path of length l starting at node i (if there is no such path, define it as -\infty). Furthermore, let’s define h(i,l) = 2g(i,l) - l, which is the maximum goodness index of any downward path of length l starting at node i. We will define this for all 1 \le i \le N and 1 \le l \le R, so that there are NR possible indices. We’ll describe how to calculate the g(i,l) and h(i,l) later, but for now, assume that we already have those values.

For convenience, let’s define an additional function, G(i,l), defined as g(\text{parent}(i),l). This function is convenient if we want to inspect paths that go down a particular child of the tree rooted at \text{parent}(i). Define H(i,l) to be 2G(i,l) - l.
Notice that:

There exists a good pure downward path of length in [L,R] if and only if there is a node i and a length l \in [L,R] such that h(i,l) > 0. Once we have all the h(i,l) values, this can then be computed in O(NR) time by doing a single pass through the whole h table.
There exists a good up-down path of length in [L,R] if and only if there are two nodes i and j and two lengths a and b such that \text{parent}(i) = \text{parent}(j), 1 \le a, b \le R, L \le a + b \le R and H(i,a) + H(j,b) > 0. (This is a bit more complicated than the previous case, but this essentially amounts to dissecting an up-down path as two pure downward paths, and calculating the goodness index of the up-down path as the sum of the goodness indices of two paths. We invite the reader to verify this carefully) Computing this quickly is not as straightforward either, and will also be described later.

Computing g(i,l) (and h(i,l) and G(i,l) and H(i,l))

Let’s focus on computing the $g(i,l)$s. It is straightforward to create a recurrence for g(i,l). Let’s define \text{good}(i,j) as 1 if the edge connecting i and j is good, and 0 otherwise. Then:

g(i,l) = \max_{\text{$j$ is a child of $i$}} \text{good}(i,j) + g(j,l-1)

For the base case, we say that g(i,0) = 0 (this represents the empty path starting at i). Implementing this in a straightforward way results in an O(NR)-time algorithm

Once all g(i,l) are computed, computing h(i,l), G(i,l) and H(i,l) can then be computed straightforwardly in O(NR) time also.

Existence of a good up-down path of length in [L,R]

This is the remaining missing part in our algorithm. A naïve algorithm to find the quadruple (i,j,a,b) by definition runs in O(N^2R^2) time (because there are O(N^2R^2) such quadruples), which is clearly too slow.

To improve this, let’s rephrase the problem a bit. Instead of checking whether there exists a quadruple (i,j,a,b) such that H(i,a) + H(j,b) > 0, we just check whether the maximum value of H(i,a) + H(j,b) among all quadruples (i,j,a,b) is > 0. It’s clear to see that these are equivalent, but the latter allows for some optimizations which will spare us from checking all quadruples.

Next, let’s use the fact that i and j have the same parent. Let p be a particular node (which will be the common parent of our i and j), and let c_1, \ldots, c_k be the children of p in increasing order. Then we want to compute the maximum value of H(c_I,a) + H(c_J,b) among all quadruples (I,J,a,b) with 1 \le I < J \le k.

Suppose we fix J, a and b. We now want to find the maximum H(c_I,a) + H(c_J,b) for all 1 \le I < J. Since H(c_J,b) is fixed, we simply want to maximize H(c_I,a) for 1 \le I < J. The key observation is that this can be computed by maintaining the running maximum of H(c_I,l) for any 1 \le l \le R. The following pseudocode illustrates it:

m[1...R] # all initialized with -INF .
for J = 1...K:
    # at this point, m[l] contains the maximum H(c_I,l) for all 1 <= I < J
    for all valid pairs (a,b):
        if m[a] + H(c_J,b) > 0:
            return true

    # update the m[i]
    for l in 1...R:
        m[l] = max(m[l], H(c_J,l))

It’s can be seen that this pseudocode detects whether there exists a quadruple (I,J,a,b) such that H(c_I,a) + H(c_J,b) > 0. But the running time is now O(NR^2) which is a lot better than before. Unfortunately this is still too slow.

Let’s try to shave off an O(R) factor Focus on the following loop (which runs in O(R^2))

for all valid pairs (a,b):
    if m[a] + H(c_J,b) > 0:
        return true

Let’s expand the for loop:

for b in 1...R:
    for a in 1...R:
        if L <= a + b <= R:
            if m[a] + H(c_J,b) > 0:
                return true

For a fixed b, the a s we will consider are those satisfying 1 \le a \le R and L \le a + b \le R. This is equivalent to \max(1, L-b) \le a \le R - b. Therefore, the above double loop is equivalent to:

for b in 1...R:
    M = max{m[a] : max(1, L-b) <= a <= R-b}
    if M + H(c_J,b) > 0:
        return true

We’ve almost reached a better running time. All that remains is to compute M quickly. Notice that M is the maximum of a subarray of m. But this is a standard range maximum query (RMQ) problem! Thus, each M s can be computed in O(\log R) using a segment tree (with an O(R) preprocessing), for a total of O(R \log R) running time.

But we can do even better! We can use the fact that in the ranges in the queries, i.e. [\max(1,L-b),R-b], the left endpoints and right endpoints are decreasing (not necessarily strictly). This makes the problem solvable by the use of a sliding RMQ algorithm, which requires no preprocessing and O(1) query time. Furthermore, the implementation is really simple and requires a single deque. So with the use of such a structure, the loop becomes O(R), which was our target.

With that final optimization, answering the question “Given V, is there a path of length in [L,R] whose Chef length is \le V?” can be done in O(NR) time. With the binary search, the overall algorithm runs in O(RN \log \max c)

Binary search improvement

Finally, we introduce a small optimization on the binary search. Notice that the answer must be one of the N-1 possible edge weights. Therefore, we can instead just binary search the answer from the sorted array of edge weights, so that the running time becomes O(RN \log N) instead of (RN \log \max c).

Time Complexity:

O(R N \log \max c) or O(R N \log N)

AUTHOR’S AND TESTER’S SOLUTIONS:

setter
tester

triveni · August 17, 2015, 4:49pm

I used the same technique as described in the quick explanation. But when I implemented my solution using recursion, I was getting WA / RE. I tried hard to locate for any bugs in my code but could not find any.

Finally I had to implement the same idea using queue and loops to get accepted. Can anyone give me some suggestion, what could be the reason ?

My solution : CodeChef: Practical coding for everyone (WA)

Same Idea : CodeChef: Practical coding for everyone (RE)

Thanks.

lebron · August 17, 2015, 11:17pm

There is absolutely no need in stuff like sliding RMQ - it’s kinda well-known that naive solution runs in O(NR), unless you intentionally implemented it in such an ugly way that it takes NR^2 Here is an example: solution.

rajeevkgprk · August 20, 2015, 8:45am

good guy lord_f. Forms really easy-to-understand solution.

abhiva_adi · August 22, 2015, 6:51pm

the tester and author solution are of a different problem

aniketetrade · August 25, 2015, 12:25pm

This post was flagged by the community and is temporarily hidden.

vastolorde95 · August 18, 2015, 12:04am

@triveni I think you are exceeding the maximum recursion limit. I am having the same problem currently and as I keep decreasing the amount of memory I am allocating in my stack, the depth of my dfs is increasing.

triveni · August 18, 2015, 4:43am

@vastolorde95 : Thanks. I too thought that stack limit might be causing the issue but when I was expecting memory limit error or some runtime error, instead it was giving me WA. That was the main reason for my confusion. Similar thing happened with some of my friends in Facebook Hackercup - round1. So I was aware of this thing. Still puzzled between why WA and not some RE for stack limit!

eisenhover · August 19, 2015, 11:33pm

@triveni I think there are some boundary cases which are causing the WA’s, since you have WA in subtask 1 in which the number of nodes is very small (<=1000) and unlikely to cause stack overflow. I haven’t solved the problem yet so I can’t help in finding the cases.

ersshiva · September 1, 2021, 10:34pm

Can someone please tell what’s wrong with this approach:

let {a,b,c} be an edge and let’s try to see if this edge can be median as per constraint or not and then we’ll take min of all such possible edges
fix edge E (a,b,c) , do dfs on component of b other then a and compute map <diff, list of p> where p->no. of edges < E , q ->no. of edges > E, diff = p-q at given dfs
do dfs on a other than b and compute say x(count < E) and y(count>E) in this dfs. now using x,y we should be able to check whether some p, q exist wuch that x+p == y + or x+p+1 == y+q.
Complexity = edges * R * log(N)

Does this sound right?