You are not logged in. Please login at to post your questions!


CUTTREE - Editorial




Author: Jatin Yadav
Tester: Triveni Mahatha
Editorialist: Adarsh Kumar




Fast Fourier Transform, Centroid decomposition, Properties of Expectation


You are given a tree with $N$ nodes. On each of the next $N-1$ days, you remove one of the remaining edges to create a forest. The strength of a forest is defined to be the sum of squares of the sizes of its connected components.

You need to compute the expected value of the strength of the resulting forest after day $i$ for all $0 \le i < n$, modulo $1000000007$.


Lets try to find the expression for $E[x^2,i]$. Here $E[x^2,i]$ represent the expected value of the strength of the resulting forest after day $i$.

Say, after $i$ days, there is a component of size $x$ consisting of vertices $(u_1,u_2,...,u_x)$. We know that, it will contribute $x^2$ to the answer with some probability. Now, what can be the other way to look at it?

We can decompose $x^2$ in the following fashion, where each $f(u_k)=1$:

$E[x^2,i] = E[(\sum_{k=1}^x f(u_k))^2,i]$

$\Rightarrow E[x^2,i] = E[\sum_{k=1}^x f(u_k)^2,i] + 2E[\sum_{a=1}^x\sum_{b=a+1}^xf(u_a)f(u_b),i]$

$\Rightarrow E[x^2,i] = E[x,i] + 2E[\sum_{a=1}^x\sum_{b=a+1}^xf(u_a)f(u_b),i]$

$\Rightarrow E[x^2,i] = n + 2E[u-v,i]$

Now, the main problem reduces to finding the expected number of ordered pairs $(u,v)$ which are connected after $i$ days, indicated by $E[u-v, i]$. This can be solved using linearity of expectation. Basically, we need to find what is the probability that $u-v$ are connected after $i$ days? If we sum this probability for all the ordered pairs, we are done.

What is the probability that $u,v$ are in the same connected component after $i$ days?

For $u,v$ to be in same component all the edges on the shortest path from $u$ to $v$ must not be removed in $i$ days. Let the number of edges between $u$ and $v$ on their shortest path be $k$ (i.e. distance between $u$ and $v$ is $k$). Then the probability $P(u-v,i)$ can be computed as:

$P(u-v,i) = \frac{\binom{n-1-i}{k}(k!)((n-1-k)!)}{(n-1)!}$

Notice that, above expression for probability only depends on the distance between $u$ and $v$. If we can somehow compute number of ordered pairs at distance $k$ as $D[k]$ for all $1 \le k < n$. We will take $D[0]=0$. Then, we can write the formula for expectation as:

$E[u-v,i] = \sum_{k=0}^{n-1-i}D[k]\frac{\binom{n-1-i}{k}(k!)((n-1-k)!)}{(n-1)!}$

Counting the number of ordered pairs at distance $k$ for all $1 \le k < n$.

This is a standard problem and can be done in $O(nlog^2n)$ with the help of centroid decomposition and FFT. Following are the step for doing this:

  1. Build the centroid tree of given tree.
  2. Traverse each node of centroid tree. When on a particular node (say P):
    • You need to count distance between two nodes present in different subtrees. For doing the same:
      1. Build a polynomial for each of the subtree. Polynomial should contain $\sum_{\text{d $\in$ subtree}} x^{\text{distance between P and d}}$.
      2. Add all the polynomial created for all the subtree and square it. This will overcount those pairs which are present in same subtrees.
      3. Subtract the square of each of the subtree polynomial to remove overcounting.
      4. Still we have every pair counted two times. So divide by two, all the coefficient of polynomial.
      5. We now have a polynomial in which coefficient of $x^k$ represents the number of paths of length $k$ passing through this node and present in different subtrees.
    • You need to count distance between P and all other nodes.
      1. Traverse all the subtree and update the count naively.
You can also seek editorial of PRIMEDST for more explanation.

Computing the value of $E[u-v,i]$ modulo $1000000007$ for all $0 \le i < n$.

Naive computation of $E[u-v,i]$ for all $0 \le i < n$ will take $O(n^2)$ time, which is not feasible here. Lets take a closer look at the formula for $E[u-v,i]$ now:

$E[u-v,i] = \sum_{k=0}^{n-1-i}D[k]\frac{\binom{n-1-i}{k}(k!)((n-1-k)!)}{(n-1)!}$

$\Rightarrow E[u-v,i] = \frac{(n-1-i)!}{(n-1)!} \sum_{k=0}^{n-1-i}D[k]\frac{(n-1-k)!}{(n-1-i-k)!}$

$\Rightarrow E[u-v,i] = \frac{(n-1-i)!}{(n-1)!} H(i)$



$\Rightarrow H(i)=\sum_{k=0}^{n-1-i}(D[k] (n-1-k)! )\left ( \frac{1}{(n-1-i-k)!} \right )$

Observe that $H(i)$ for all $0 \le i < n$ can now be computed from convolution of polynomial $A(x)$ and $B(x)$ where:

$A(x) = (D[0] (n-1)!) x^0 + (D[1] (n-2)!) x^1 + ... (D[r] (n-1-r)!) x^r + ... + (D[n-1] 0!) x^{n-1}$, and

$B(x) = \left(\frac{1}{0!}\right) x^0 + \left(\frac{1}{1!}\right) x^1 + ... \left(\frac{1}{r!}\right) x^r + ... + \left(\frac{1}{(n-1)!}\right) x^{n-1}$

Value of $H(i)$ is the coefficient of $x^{n-1-i}$ in $A(x)*B(x)$ modulo $1000000007$.

Multiplication of two polynomials with arbitrary modulo.

We have to multiply two polynomials and then output result coefficients modulo $M$ which is not necessary of the kind $c · 2^k + 1$. Coefficients in multiplication can be up to $O(M^2n)$ which usually can’t be handled precisely enough by floating point types. You can have a look at this pdf, in order to solve this problem efficiently in $O(nlogn)$.

Time Complexity:



Setter's solution
Tester's solution
Editorialist's solution

This question is marked "community wiki".

asked 28 Feb, 14:03

adkroxx's gravatar image

accept rate: 7%

edited 13 Mar, 15:23

admin's gravatar image

0★admin ♦♦

Let me try to write an explation for the expected strength of a forest.

We start with a tree with $n$ nodes, and randomly decompose it into a forest of smaller trees.

Let $T$ be the set of possible trees that can be in any random forest.
Use $t$ to denote a tree in the set $T$.
Use $u$ and $v$ to denote nodes in a tree.
Write $S_t$ for the strength of tree $t$. From the question, $S_t$ is the square of the number of nodes in $t$.
Write $p(t)$ for the probability that tree $t$ is present in the random forest.

And let $E$ be the expected strength of the random forest.

We should start by summing the product of the probability of each random forest and the strengths of the trees. Then gather together the probability for each possible tree, and we will have \begin{equation} E = \sum_{t \in T} S_t \times p(t) \end{equation} The difficulty is that calculating $p(t)$ would be tricky.

We will now go through a complicated looking process so that we can split $S_t$ into a sum of smaller pieces. Then we will be able to swap the order of summations, and perform a sum over $t \in T$ to simplify the expression for $E$.

Define an indicator function $I(t,u)$ which has value 1 if node $u$ is in tree $t$, and 0 otherwise. Then, for a fixed tree $t$ with $r$ nodes, and summing over all nodes in the forest \begin{equation} \sum_u I(t,u) = r \end{equation} This last expression is a complicated way to say that there are just $r$ possible nodes $u$ where $I(t,u)=1$.

Similarly, define another indicator function $I(t,u,v)$ to have value 1 if both nodes $u$ and $v$ are in tree $t$, and 0 otherwise. Then, for fixed tree $t$ with $r$ nodes, and summing over all ordered pairs of distinct nodes in the forest, \begin{equation} \sum_{u,v, u \ne v} I(t,u,v) = r(r-1) \end{equation} Again, this expression is just saying that there are $r(r-1)$ pairs of nodes $u,v$ in tree $t$.

We can now rewrite $S_t$ as two sums: \begin{equation} S_t = r^2 = r + r(r-1) = \sum_u I(t,u) + \sum_{u,v, u \ne v} I(t,u,v) \end{equation}

And substituting into the expression for $E$, \begin{equation} E = \sum_{t \in T} \left( \sum_{u} I(t,u) + \sum_{u,v, u \ne v} I(t,u,v) \right) \times p(t) \end{equation} We remove the brackets to get \begin{equation} E = \sum_{t \in T} \sum_{u} I(t,u) p(t) + \sum_{t \in T} \sum_{u,v, u \ne v} I(t,u,v) p(t) \end{equation}

The sums are all finite, so the order of the $\sum$s can be swapped. Thus \begin{equation} E = \sum_{u} \sum_{t \in T} I(t,u) p(t) + \sum_{u,v, u \ne v} \sum_{t \in T} I(t,u,v) p(t) \end{equation}

In any of the possible random forests, a fixed node $u$ is always in exactly one tree. So we have \begin{equation} \sum_{t \in T} I(t,u) p(t) = 1 \end{equation} There are $n$ nodes in the forest, so summing over all possible nodes gives \begin{equation} \sum_{u} \sum_{t \in T} I(t,u) p(t) = \sum_{u} 1 = n \end{equation}

Similar argument (which I am struggling to express) gives \begin{equation} \sum_{t \in T} I(t,u,v) \times p(t) = {\rm probability}~u~{\rm and}~v~{\rm are~in~same~connected component} \end{equation}

And so we have the required result \begin{equation} E = n + \sum_{u,v, u \ne v} {\rm probability}~u~{\rm and}~v~{\rm are~in~the~same~connected~ component} \end{equation}

The sum is over ordered pairs $(u,v)$ with $u \ne v$. If considering un-ordered pairs, there is an extra factor of 2.


answered 13 Mar, 04:53

john_smith_3's gravatar image

accept rate: 25%

edited 13 Mar, 04:58

"In any of the possible random forests, a fixed node uu is always in exactly one tree. So we have ∑t∈TI(t,u)p(t)=1"

Could u please explain this line?

(13 Mar, 08:38) vivek_19982995★

And just for clearing my doubt:-

If tree is


             /    \

           2.       3

Then T={{1,2},{1,3},{2,3},{1},{2},{3},{1,2,3}} right?

(13 Mar, 08:43) vivek_19982995★

$\{2,3\}$ do not form a tree so should not be in $T$.

(13 Mar, 13:02) john_smith_36★

Yeah sorry,can u answer my first question plz?

(13 Mar, 16:18) vivek_19982995★

Part 1:
This reply to Vivek is coming in three parts.

We started with a single tree, and randomly decomposed it into a forest of trees.
Let $f_1, f_2, \ldots , f_k$ be the possible resulting forests.
Suppose the corresponding probabilities of reaching each forest are $p_1, p_2, \ldots, p_k$.

First key observation is that \begin{equation} p_1 + p_2 + \cdots p_k = 1 \end{equation}

Second observation is that if we fix a node $u$, then in each forest there is just one tree containing node $u$.

(14 Mar, 02:47) john_smith_36★

Part 2:

Let $t$ be a tree. Then $p(t)$, the probability of $t$ being in the resulting random forest, is given by the sum of the $p_i$ for all the forests $f_i$ containing $t$.
In symbols, we write this as: \begin{equation} p(t) = \sum_{i~{\rm with}~t~{\rm in}~f_i} p_i \end{equation}

Now fix a node $u$ and sum over different trees containing $u$ \begin{equation} \sum_{t \in T~{\rm where}~t~{\rm contains}~u} p(t) = \sum_{t \in T~{\rm where}~t~{\rm contains}~u~~} \sum_{i~{\rm with}~t~{\rm in}~f_i} p_i \end{equation}

(14 Mar, 02:48) john_smith_36★

Because $I(t,u) = 1$ when $t~{\rm contains}~u$, and $0$ otherwise, the left hand side is the same as \begin{equation} \sum_{t \in T} I(t,u) p(t) \end{equation}

For the right hand side, we ask ourselves how many times does each $p_i$ appear in the combined sum? Answer, once for every tree in $f_i$ containing node $u$.
How many trees are there in forest $f_i$ containing node $u$? Exactly 1.

So the right hand side simplifies to \begin{equation} \sum_{i=1 \ldots k} p_i = 1 \end{equation}

which is the result we require.

(14 Mar, 02:48) john_smith_36★

Thank you so much!! Wished i could upvote you 1 more time.

(14 Mar, 07:52) vivek_19982995★
showing 5 of 8 show all

I did the convolution in pypy 2 in a bit unusual way. Python can handle multiplication of arbitrary sized integers, for example numbers like $64 \cdot 10^{10^5}$. Convolution can be expressed as integer multiplication, so why not just use python's built in integer multiplication. From this idea I made my own fast convolution using big integers, which got accepted. How I love python =). Unfortunately this convolution was only able to get me 60 points on the CHEFKNN problem. Had to use C++ to fully solve that one.

Link to accepted pypy2 solution to CUTTREE

Link to 60points pypy2 solution to CHEFKNN

Here is the code, doing fast convolution with big integers:

View Content

answered 13 Mar, 05:54

gorre_morre's gravatar image

accept rate: 0%

edited 13 Mar, 06:00

Wow! I could never imagine an FFT problem pass in Python. Great stuff. Especially for this problem, getting it to pass in C++ was not straight-forward, at least for me.

(13 Mar, 12:18) wittyceaser6★

We can decompose x2 in the following fashion, where each f(uk)=1:... what is f(uk) ?


answered 12 Mar, 19:20

nik_84's gravatar image

accept rate: 0%

$u_k$ represents a vertex. $f(u_k)$ just represents that we are contributing because of vertex $u_k$ in the form of $f(u_k)$.

(12 Mar, 23:30) adkroxx7★

Same approach except for the fact that I used karatsuba for all the polynomial multiplications, barely passes all the test cases with all the optimizations.


answered 13 Mar, 18:26

mathmaniac's gravatar image

accept rate: 0%

toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text]( "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported
  • mathemetical formulas in Latex between $ symbol

Question tags:


question asked: 28 Feb, 14:03

question was seen: 1,096 times

last updated: 14 Mar, 07:52