PROBLEM LINK:
Author: Sergey Kulik
Testers: Vasya Antoniuk and Kevin Atienza
Translators: Vasya Antoniuk (Russian), Team VNOI (Vietnamese) and Hu Zecong (Mandarin)
Editorialist: Kevin Atienza
DIFFICULTY:
Medium
PREREQUISITES:
Trees, BFS/DFS, memoization, tree diameter
PROBLEM:
For every edge in a weighted tree, compute the diameters of the resulting subtrees if the edge is removed.
QUICK EXPLANATION:
For every edge (a,b):
 Let L(a,b) be the length of the edge.
 Let H(a,b) be the height of the tree rooted at b if the edge (a,b) is removed, plus L(a,b).
 Let D(a,b) be the diameter of the tree rooted at b if the edge (a,b) is removed.
We have the following formulas:
where:
 c_1 ot= a is the neighbor of b with the maximum D(b,c)
 d_1 ot= a is the neighbor of b with the maximum H(b,d).
 d_2 ot= a is the neighbor of b with the second maximum H(b,d).
Note that H(a,b) is not the same as H(b,a)! Similarly, D(a,b) is not the same as D(b,a). But there are at most 2(N1) valid arguments to H(a,b) and D(a,b), two for each edge, so they can be memoized.
Be careful that your code doesn’t run in O(N^2) if there’s a node with a high degree!
EXPLANATION:
Diameter of a tree
Let’s first discuss how to compute the diameter of a tree, without removing any edge. Recall that the diameter of a tree is the length of its longest path. Actually, the word “diameter” can also refer to the longest path itself, but hopefully it will be clear from context which sense of the word is being used.
In problems involving unrooted trees, it’s usually beneficial to root the tree at some node, because rooting the tree gives us a place where to start the computation. So in our case, let’s root the tree at some arbitrary node, say node 1.
By rooting the tree, we can now distinguish two cases for the diameter: Either it passes through the root or not.
 If it doesn’t pass through the root, then this diameter is completely contained in some subtree. So it must also be a diameter of some subtree, because the diameter of a subtree cannot be longer than the diameter of the whole tree.
 If it passes through the root, then the path consists of an upward path and a downward path, which meet at the root. But for this to be the longest path, both parts must be as long as possible. This means they must start/end at some leaf (so their lengths must be the heights of the corresponding subtrees plus an additional edge towards the root), and also they must be the longest two such paths.
We can now formalize these cases into an algorithm to compute the diameter. We introduce some notation. For every edge (a,b), let L(a,b) be the length of the edge. Also,
 Let H(b) be the height of the tree rooted at b.
 Let D(b) be the diameter of the tree rooted at b.
Then we have the following formulas:
where
 c_1 is the child of b with the maximum D(c).
 d_1 is the child of b with the maximum L(b,d) + H(d).
 d_2 is the child of b with the second maximum L(b,d) + H(d).
This can easily be implemented as a recursive procedure that starts at node 1! This runs in linear time in the input.
As an additional note, in some languages this will trigger stack overflow because the recursive calls may go too deep. We’ll discuss that later.
Edge removals
Now, what happens if we remove an edge? If we continue considering the tree as rooted on node 1, then computing the diameter of one of the trees is easy, because it is just D(b). However, the other tree’s diameter isn’t as easy to compute.
It would be nice if we can select which root we want any time. For example, suppose we remove the edge (a,b). Then it would be nice if we can have a and b be the roots of the two resulting trees, so that we can perform a similar traversal above. Specifically, let’s generalize our H(b) and D(b) functions above. If (a,b) is an edge, then:
 Let H(a,b) be the height of the tree rooted at b if the edge (a,b) is removed, plus L(a,b).
 Let D(a,b) be the diameter of the tree rooted at b if the edge (a,b) is removed.
H(a,b) and D(a,b) satisfy formulas similar to the ones above:
where:
 c_1 is the neighbor of b that is not equal to a and has the maximum D(b,c).
 d_1 is the neighbor of b that is not equal to a and has the maximum H(b,d).
 d_2 is the neighbor of b that is not equal to a and has the second maximum H(b,d).
Notice that we adjusted the H(a,b) function to also include L(a,b) so the formulas simplify a bit.
Now, suppose we remove the edge (a,b). Then the diameters of the resulting trees are D(a,b) and D(b,a). If we try to compute these two, we would need O(N) time to compute everything. But there are N1 edges, so this will take O(N^2) time, which is too slow.
The key here is to notice that since there are N1 edges, there are only 2(N1) possible arguments to the functions H(a,b) and D(a,b). Also, we only need to compute each H(a,b) and D(a,b) once. This suggests that a memoization approach might work. Specifically, we compute H(a,b) and D(a,b) using the above formulas, but after computing them once, we store them so we don’t have to compute them again! Since there only 2(N1) possible arguments, we will only need to compute H(a,b) and D(a,b) once, which is potentially faster!
As an illustration, here is some pseudocode to compute H(a,b):
def H(a,b):
if H(a,b) has already been computed before:
return the stored value for H(a,b)
else:
result = 0
for every neighbor c of b:
if c != a:
result = max(result, H(b,c))
store 'result' as the result of H(a,b) (for later use)
return result
The pseudocode for D(a,b) is very similar.
Handling large degrees
You might be tempted to say that the solution above already runs in O(N) time, so it should already pass the time limit. Unfortunately, it’s not quite O(N) yet. To see why, notice that computing H(a,b) and D(a,b) actually runs in O(\operatorname{deg} b) time, where \operatorname{deg} b is the degree of b. H(a,b) and D(a,b) will have to be computed for every neighbor a of b. But since b has \operatorname{deg} b neighbors, these computations run in O((\operatorname{deg} b)^2) time overall. If b has a lot of neighbors, this is bad. In fact, b can have up to O(N) neighbors, so this runs in O(N^2) time in the worst case!
The bottleneck in computing H(a,b) and D(a,b) is finding three particular neighbors of b: c_1, d_1 and d_2. Unfortunately, these three nodes also depend on a, which means we have to check all neighbors of b for each a. Let’s write these nodes as c_1(a), d_1(a) and d_2(a) instead, to emphasize their dependence on a (of course they depend on b as well). For example, c_1(a) is defined as the neighbor of b that is not equal to a and has the maximum D(b,c).
But if you look closer, for many $a$s, c_1(a) is actually the same node! Namely, let c_1' be the neighbor of b that has the maximum D(b,c) overall. Then for most neighbors a of b, the node c_1(a) is the same as c_1'. The only time this isn’t true is when a is c_1' itself! Similarly, let d_1' and d_2' be the neighbors of b that have the maximum and second maximum H(b,d), respectively. Then for most neighbors a of b, the node d_1(a) is the same as d_1' and d_2(a) is the same as d_2'. The only time this isn’t true is when a is equal to d_1' or d_2'.
How can we use this? Well, as said above,
 If a ot= c_1', then c_1(a) is simply c_1'.
 But if a = c_1', then c_1(a) is actually c_2', which we can define as the neighbor of b that has the second maximum D(b,c).
This means that all we need to keep track of are just two neighbors of b with the highest D(b,c)!
Similarly, if we define d_3' as the neighbor of b that has the third maximum H(b,d), then:
 If a ot= d_1' and a ot= d_2', then d_1(a) = d_1' and d_2(a) = d_2'.
 If a = d_1', then d_1(a) = d_2' and d_2(a) = d_3'.
 If a = d_2', then d_1(a) = d_1' and d_2(a) = d_3'.
So now we can come up with a strategy to prevent the worst case O((\operatorname{deg} b)^2):
 The first time we visit b, e.g., when we need to compute H(a,b) and D(a,b), we will compute c_1(a), d_1(a) and d_2(a) normally. This runs in O(\operatorname{deg} b) time. At this point, we have computed the D(b,c) and H(b,c) for all neighbors c of b, except for a itself. We store these values. We also store a, the neighbor of b which we used to first visit b.
 The second time we visit b, e.g., when we need to compute H(a',b) and D(a',b), we will now compute D(b,a) and H(b,a). At this point, we now have computed D(b,c) and H(b,c) for all neighbors c of b, including a, so we can now compute c_1', c_2', d_1', d_2' and d_3', and using the rules above we can determine c_1(a'), d_1(a') and d_2(a').
 For all subsequent visits H(a'',b) and D(a'',b), we can now use the above rules to determine c_1(a''), d_1(a'') and d_2(a'') in O(1) time!
This reduces the overall running time per node b to just O(\operatorname{deg} b), and the overall running time of the algorithm to O(N)!
As a side note, d_3' (and possibly d_2' and c_2') is undefined if b has fewer than three neighbors. In this case, we assign them to a dummy node c' having H(b,c') = D(b,c') = 0.
Stack overflow
As a final note, the approach above may fail because the recursive calls become too deep. Note that N can reach 5\cdot 10^6 which is larger than usual, so in many languages a stack overflow may occur. There are a few ways to get around this:

Enlarge the runtime stack manually. In C++, the following can be used:
#include <sys/resource.h> ... void enlarge_stack() { struct rlimit lim; getrlimit(RLIMIT_STACK, &lim); lim.rlim_cur = lim.rlim_max; setrlimit(RLIMIT_STACK, &lim); }
In Python, we can use
sys.setrecursionstack(BIG_NUMBER)
, but note that you can’t usually set this to a very high number like in this problem. So this method might not work. In other languages, there might not even be ways to enlarge the stack at runtime.The setter’s solution uses this method.

Convert the algorithm to an iterative one, by simulating the runtime stack yourself. This is trickier to implement, but has the added benefit that it will work in most languages. The tester’s solution below uses this.
Time Complexity:
O(N)