SETELE - Editorial

kevinsogo · November 20, 2016, 9:43pm

PROBLEM LINK:

Author: Istvan Nagy
Tester: Kevin Atienza
Translators: Sergey Kulik (Russian), Team VNOI (Vietnamese) and Hu Zecong (Mandarin)
Editorialist: Kevin Atienza

DIFFICULTY:

Easy-Medium

PREREQUISITES:

Minimum spanning tree, Kruskal’s algorithm, union-find

PROBLEM:

You are given a weighted undirected tree. Two nodes are randomly chosen to be connected with an edge of 0 cost. Find the expected cost of the MST of the resulting graph.

QUICK EXPLANATION:

We want the value C_{MST} - \frac{S}{T} where

C_{MST} is the MST cost of the original graph (i.e. just the sum of the costs of all edges).
S is the sum of the largest edge weight across all paths.
T is the number of paths (i.e. N(N-1)/2).

S is the only nontrivial thing to compute. Perform Kruskal’s algorithm, but for each connected component, also keep track of its size. Then, S is the sum of \mathrm{size}(a)\cdot \mathrm{size}(b)\cdot c for every step (a,b,c) of the Kruskal algorithm.

EXPLANATION:

Updating the MST

Suppose we want to connect nodes i and j with an edge of 0 cost. How will the MST be updated?

Here’s one equivalent characterization of a (finite) tree: A tree is an acyclic graph with exactly N-1 edges. This gives us some clear requirements on how to turn the graph back into a tree again. Specifically, upon adding the edge (i,j,0):

We are creating exactly one cycle, namely the cycle i \rightarrow \ldots \rightarrow j \rightarrow i, where the \ldots represents the original path from i to j. Thus, we need to remove at least one edge from this cycle.
The number of edges is now N, which means we must remove exactly one edge.

Together, this means that we must remove one edge from that cycle. Which edge? Well, we want the resulting graph to have the smallest possible cost, so naturally we want to remove the edge with the largest cost!

To summarize, by adding the edge (i,j,0), the cost of the MST reduces by exactly the largest edge weight in the path from i to j!

Expected value of the new MST

To answer the question, we want to find the expected cost of the new MST. Let C_{MST} be the cost of the original tree. Then from the above, and from the fact that (i,j) is uniformly chosen, we see that this expected value is simply C_{MST} - \frac{S}{T}, where

S is the sum of the largest edge weight across all paths.
T is the number of paths (i.e. \binom{N}{2} = N(N-1)/2).

T is pretty easy to compute, (just be careful with overflow!) so all that remains is to compute S. Unfortunately, naïve ways of computing this would be too slow, because there are O(N^2) paths! So instead of computing the sum across paths, let’s try to compute the sum across all edges, and just figure out how many paths have that edge as the maximum-cost edge. In other words,

S = \sum_{\text{$(a,b,c)$ is an edge}} c\cdot F(a,b,c)

where F(a,b,c) is the number of paths whose largest-cost edge is (a,b,c).

How do we compute F(a,b,c)? Let’s consider some path x \leftrightarrow y. This path has (a,b,c) as its largest-cost edge if and only if the following two conditions are satisfied:

x is connected to a with edges of cost < c.
y is connected to b with edges of cost < c.

(Note that it might be the other way around, i.e. x is connected to b and y is connected to a, but then again, paths are symmetric, so x \leftrightarrow y should be considered the same as y \leftrightarrow x.)

So we find that F(a,b,c) is simply R(a,c)\cdot R(b,c), where R(x,c) is the number of nodes reachable from x with edges of cost < c. But how do we compute R(x,c)? Amazingly, Kruskal’s algorithm can help us here. Remember that Kruskal’s algorithm considers the edges in increasing order and unites the nodes into components in that order. This means that, during the step where we process the edge (a,b,c), R(a,c) and R(b,c) are simply the sizes of the components currently containing a and b!

This gives us the following solution: Perform Kruskal’s algorithm, but for each connected component, also keep track of its size. Then, S is the sum of \mathrm{size}(a)\cdot \mathrm{size}(b)\cdot c for every step (a,b,c) of the Kruskal algorithm!

In pseudocode:

size[1..N] = [1,1,...,1]
parent[1..N] = [1,2,...,N]

# 'find' operation in 'union-find'
def find(n):
    return parent[n] == n ? n : parent[n] = find(parent[n])


S = T = C = 0
for all edges (a,b,c) sorted according to c: 
    # find
    a = find(a)
    b = find(b)

    # update values
    S += size[a]*size[b]*c
    T += size[a]*size[b]
    C += c

    # union
    if size[a] < size[b]:
        parent[a] = b
        size[b] += size[a]
    else:
        parent[b] = a
        size[a] += size[b]

print C - S/T  # use exact division!

Time Complexity:

O(N \log N)

AUTHOR’S AND TESTER’S SOLUTIONS:

setter
tester
editorialist

bishalg · November 21, 2016, 8:52am

This is what I had did in contest time: link text and this is after contest : link text
Differences: second last line
Moral: Dont use IDE which shows 0.0000000 if u use printf and if u use cout for double , dont forget to write cout<<fixed before setprecision !!

omjego · November 21, 2016, 9:00am

Nice problem with really nice editorial.Learned a new way to approach problems.

ankurverma1994 · November 21, 2016, 10:58am

I came up with the solution during contest time but was struggling to find how many paths have that edge as the maximum-cost edge. I was thinking of precomputing something with DFS so that all queries can be answered quickly but could not come up with so. My bad I could not think that Kruskal algorithm can help here.

Between did any one managed to solve without Kruskal algorithm i.e. by applying some DFS or something?

Thanks for the question and nice editorial learned something new from it.

mayankmax80 · November 22, 2016, 6:44pm

I’m unable to understand the editorial.

forhadsidhu · November 24, 2016, 12:04pm

why check SIZE[a]<size[b]?why we cannot dirrectly use size[a]+=size[b] parrent[a]=b? any onle plz

codedecode0111 · November 21, 2016, 3:54am

I think setter’s code is not meant for this problem.

kevinsogo · November 26, 2016, 10:31am

This is called “union by size”. Without it, the worst case for union-find becomes O(n). (and thus the solution becomes O(n^2) in the worst case.)