You are not logged in. Please login at to post your questions!


SETELE - Editorial




Author: Istvan Nagy
Tester: Kevin Atienza
Translators: Sergey Kulik (Russian), Team VNOI (Vietnamese) and Hu Zecong (Mandarin)
Editorialist: Kevin Atienza




Minimum spanning tree, Kruskal's algorithm, union-find


You are given a weighted undirected tree. Two nodes are randomly chosen to be connected with an edge of $0$ cost. Find the expected cost of the MST of the resulting graph.


We want the value $C_{MST} - \frac{S}{T}$ where

  • $C_{MST}$ is the MST cost of the original graph (i.e. just the sum of the costs of all edges).
  • $S$ is the sum of the largest edge weight across all paths.
  • $T$ is the number of paths (i.e. $N(N-1)/2$).

$S$ is the only nontrivial thing to compute. Perform Kruskal's algorithm, but for each connected component, also keep track of its size. Then, $S$ is the sum of $\mathrm{size}(a)\cdot \mathrm{size}(b)\cdot c$ for every step $(a,b,c)$ of the Kruskal algorithm.


Updating the MST

Suppose we want to connect nodes $i$ and $j$ with an edge of $0$ cost. How will the MST be updated?

Here's one equivalent characterization of a (finite) tree: A tree is an acyclic graph with exactly $N-1$ edges. This gives us some clear requirements on how to turn the graph back into a tree again. Specifically, upon adding the edge $(i,j,0)$:

  • We are creating exactly one cycle, namely the cycle $i \rightarrow \ldots \rightarrow j \rightarrow i$, where the $\ldots$ represents the original path from $i$ to $j$. Thus, we need to remove at least one edge from this cycle.
  • The number of edges is now $N$, which means we must remove exactly one edge.

Together, this means that we must remove one edge from that cycle. Which edge? Well, we want the resulting graph to have the smallest possible cost, so naturally we want to remove the edge with the largest cost!

To summarize, by adding the edge $(i,j,0)$, the cost of the MST reduces by exactly the largest edge weight in the path from $i$ to $j$!

Expected value of the new MST

To answer the question, we want to find the expected cost of the new MST. Let $C_{MST}$ be the cost of the original tree. Then from the above, and from the fact that $(i,j)$ is uniformly chosen, we see that this expected value is simply $C_{MST} - \frac{S}{T}$, where

  • $S$ is the sum of the largest edge weight across all paths.
  • $T$ is the number of paths (i.e. $\binom{N}{2} = N(N-1)/2$).

$T$ is pretty easy to compute, (just be careful with overflow!) so all that remains is to compute $S$. Unfortunately, naïve ways of computing this would be too slow, because there are $O(N^2)$ paths! So instead of computing the sum across paths, let's try to compute the sum across all edges, and just figure out how many paths have that edge as the maximum-cost edge. In other words,

$$S = \sum_{\text{$(a,b,c)$ is an edge}} c\cdot F(a,b,c)$$

where $F(a,b,c)$ is the number of paths whose largest-cost edge is $(a,b,c)$.

How do we compute $F(a,b,c)$? Let's consider some path $x \leftrightarrow y$. This path has $(a,b,c)$ as its largest-cost edge if and only if the following two conditions are satisfied:

  • $x$ is connected to $a$ with edges of cost $< c$.
  • $y$ is connected to $b$ with edges of cost $< c$.

(Note that it might be the other way around, i.e. $x$ is connected to $b$ and $y$ is connected to $a$, but then again, paths are symmetric, so $x \leftrightarrow y$ should be considered the same as $y \leftrightarrow x$.)

So we find that $F(a,b,c)$ is simply $R(a,c)\cdot R(b,c)$, where $R(x,c)$ is the number of nodes reachable from $x$ with edges of cost $< c$. But how do we compute $R(x,c)$? Amazingly, Kruskal's algorithm can help us here. Remember that Kruskal's algorithm considers the edges in increasing order and unites the nodes into components in that order. This means that, during the step where we process the edge $(a,b,c)$, $R(a,c)$ and $R(b,c)$ are simply the sizes of the components currently containing $a$ and $b$!

This gives us the following solution: Perform Kruskal's algorithm, but for each connected component, also keep track of its size. Then, $S$ is the sum of $\mathrm{size}(a)\cdot \mathrm{size}(b)\cdot c$ for every step $(a,b,c)$ of the Kruskal algorithm!

In pseudocode:

size[1..N] = [1,1,...,1]
parent[1..N] = [1,2,...,N]

# 'find' operation in 'union-find'
def find(n):
    return parent[n] == n ? n : parent[n] = find(parent[n])

S = T = C = 0
for all edges (a,b,c) sorted according to c: 
    # find
    a = find(a)
    b = find(b)

    # update values
    S += size[a]*size[b]*c
    T += size[a]*size[b]
    C += c

    # union
    if size[a] < size[b]:
        parent[a] = b
        size[b] += size[a]
        parent[b] = a
        size[a] += size[b]

print C - S/T  # use exact division!

Time Complexity:

$O(N \log N)$



This question is marked "community wiki".

asked 20 Nov '16, 21:43

kevinsogo's gravatar image

accept rate: 11%

edited 22 Nov '16, 16:36

admin's gravatar image

0★admin ♦♦


I think setter's code is not meant for this problem.

(21 Nov '16, 03:54) codedecode01115★

Nice problem with really nice editorial.Learned a new way to approach problems.


answered 21 Nov '16, 09:00

omjego's gravatar image

accept rate: 28%

I came up with the solution during contest time but was struggling to find how many paths have that edge as the maximum-cost edge. I was thinking of precomputing something with DFS so that all queries can be answered quickly but could not come up with so. My bad I could not think that Kruskal algorithm can help here.
Between did any one managed to solve without Kruskal algorithm i.e. by applying some DFS or something?
Thanks for the question and nice editorial learned something new from it.


answered 21 Nov '16, 10:58

ankurverma1994's gravatar image

accept rate: 8%

edited 21 Nov '16, 10:59

This is what I had did in contest time: link text and this is after contest : link text Differences: second last line Moral: Dont use IDE which shows 0.0000000 if u use printf and if u use cout for double , dont forget to write cout<<fixed before setprecision !!


answered 21 Nov '16, 08:52

bishalg's gravatar image

accept rate: 0%

I'm unable to understand the editorial.

This answer is marked "community wiki".

answered 22 Nov '16, 18:44

mayankmax80's gravatar image

accept rate: 0%

why check SIZE[a]<size[b]?why we cannot dirrectly use size[a]+=size[b] parrent[a]=b? any onle plz


answered 24 Nov '16, 12:04

forhadsidhu's gravatar image

accept rate: 0%


This is called "union by size". Without it, the worst case for union-find becomes O(n). (and thus the solution becomes O(n^2) in the worst case.)

(26 Nov '16, 10:31) kevinsogo7★
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text]( "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported
  • mathemetical formulas in Latex between $ symbol

Question tags:


question asked: 20 Nov '16, 21:43

question was seen: 4,105 times

last updated: 26 Nov '16, 10:31