PROBLEM LINK:
Author: Jatin Yadav DIFFICULTY:Hard PREREQUISITES:Fast Fourier Transform, Centroid decomposition, Properties of Expectation PROBLEM:You are given a tree with $N$ nodes. On each of the next $N1$ days, you remove one of the remaining edges to create a forest. The strength of a forest is defined to be the sum of squares of the sizes of its connected components. You need to compute the expected value of the strength of the resulting forest after day $i$ for all $0 \le i < n$, modulo $1000000007$. EXPLANATION:Lets try to find the expression for $E[x^2,i]$. Here $E[x^2,i]$ represent the expected value of the strength of the resulting forest after day $i$. Say, after $i$ days, there is a component of size $x$ consisting of vertices $(u_1,u_2,...,u_x)$. We know that, it will contribute $x^2$ to the answer with some probability. Now, what can be the other way to look at it? We can decompose $x^2$ in the following fashion, where each $f(u_k)=1$: $\Rightarrow E[x^2,i] = E[\sum_{k=1}^x f(u_k)^2,i] + 2E[\sum_{a=1}^x\sum_{b=a+1}^xf(u_a)f(u_b),i]$ $\Rightarrow E[x^2,i] = E[x,i] + 2E[\sum_{a=1}^x\sum_{b=a+1}^xf(u_a)f(u_b),i]$ $\Rightarrow E[x^2,i] = n + 2E[uv,i]$ Now, the main problem reduces to finding the expected number of ordered pairs $(u,v)$ which are connected after $i$ days, indicated by $E[uv, i]$. This can be solved using linearity of expectation. Basically, we need to find what is the probability that $uv$ are connected after $i$ days? If we sum this probability for all the ordered pairs, we are done. What is the probability that $u,v$ are in the same connected component after $i$ days? For $u,v$ to be in same component all the edges on the shortest path from $u$ to $v$ must not be removed in $i$ days. Let the number of edges between $u$ and $v$ on their shortest path be $k$ (i.e. distance between $u$ and $v$ is $k$). Then the probability $P(uv,i)$ can be computed as: Notice that, above expression for probability only depends on the distance between $u$ and $v$. If we can somehow compute number of ordered pairs at distance $k$ as $D[k]$ for all $1 \le k < n$. We will take $D[0]=0$. Then, we can write the formula for expectation as: Counting the number of ordered pairs at distance $k$ for all $1 \le k < n$. This is a standard problem and can be done in $O(nlog^2n)$ with the help of centroid decomposition and FFT. Following are the step for doing this:
Computing the value of $E[uv,i]$ modulo $1000000007$ for all $0 \le i < n$. Naive computation of $E[uv,i]$ for all $0 \le i < n$ will take $O(n^2)$ time, which is not feasible here. Lets take a closer look at the formula for $E[uv,i]$ now: $\Rightarrow E[uv,i] = \frac{(n1i)!}{(n1)!} \sum_{k=0}^{n1i}D[k]\frac{(n1k)!}{(n1ik)!}$ $\Rightarrow E[uv,i] = \frac{(n1i)!}{(n1)!} H(i)$ where, $\Rightarrow H(i)=\sum_{k=0}^{n1i}(D[k] (n1k)! )\left ( \frac{1}{(n1ik)!} \right )$ Observe that $H(i)$ for all $0 \le i < n$ can now be computed from convolution of polynomial $A(x)$ and $B(x)$ where: $B(x) = \left(\frac{1}{0!}\right) x^0 + \left(\frac{1}{1!}\right) x^1 + ... \left(\frac{1}{r!}\right) x^r + ... + \left(\frac{1}{(n1)!}\right) x^{n1}$ Value of $H(i)$ is the coefficient of $x^{n1i}$ in $A(x)*B(x)$ modulo $1000000007$. Multiplication of two polynomials with arbitrary modulo. We have to multiply two polynomials and then output result coefficients modulo $M$ which is not necessary of the kind $c · 2^k + 1$. Coefficients in multiplication can be up to $O(M^2n)$ which usually can’t be handled precisely enough by floating point types. You can have a look at this pdf, in order to solve this problem efficiently in $O(nlogn)$. Time Complexity:$O(nlog^2n)$ AUTHOR'S AND TESTER'S SOLUTIONS
This question is marked "community wiki".
asked 28 Feb, 14:03

Let me try to write an explation for the expected strength of a forest. We start with a tree with $n$ nodes, and randomly decompose it into a forest of smaller trees. Let $T$ be the set of possible trees that can be in any random forest. And let $E$ be the expected strength of the random forest. We should start by summing the product of the probability of each random forest and the strengths of the trees. Then gather together the probability for each possible tree, and we will have \begin{equation} E = \sum_{t \in T} S_t \times p(t) \end{equation} The difficulty is that calculating $p(t)$ would be tricky. We will now go through a complicated looking process so that we can split $S_t$ into a sum of smaller pieces. Then we will be able to swap the order of summations, and perform a sum over $t \in T$ to simplify the expression for $E$. Define an indicator function $I(t,u)$ which has value 1 if node $u$ is in tree $t$, and 0 otherwise. Then, for a fixed tree $t$ with $r$ nodes, and summing over all nodes in the forest \begin{equation} \sum_u I(t,u) = r \end{equation} This last expression is a complicated way to say that there are just $r$ possible nodes $u$ where $I(t,u)=1$. Similarly, define another indicator function $I(t,u,v)$ to have value 1 if both nodes $u$ and $v$ are in tree $t$, and 0 otherwise. Then, for fixed tree $t$ with $r$ nodes, and summing over all ordered pairs of distinct nodes in the forest, \begin{equation} \sum_{u,v, u \ne v} I(t,u,v) = r(r1) \end{equation} Again, this expression is just saying that there are $r(r1)$ pairs of nodes $u,v$ in tree $t$. We can now rewrite $S_t$ as two sums: \begin{equation} S_t = r^2 = r + r(r1) = \sum_u I(t,u) + \sum_{u,v, u \ne v} I(t,u,v) \end{equation} And substituting into the expression for $E$, \begin{equation} E = \sum_{t \in T} \left( \sum_{u} I(t,u) + \sum_{u,v, u \ne v} I(t,u,v) \right) \times p(t) \end{equation} We remove the brackets to get \begin{equation} E = \sum_{t \in T} \sum_{u} I(t,u) p(t) + \sum_{t \in T} \sum_{u,v, u \ne v} I(t,u,v) p(t) \end{equation} The sums are all finite, so the order of the $\sum$s can be swapped. Thus \begin{equation} E = \sum_{u} \sum_{t \in T} I(t,u) p(t) + \sum_{u,v, u \ne v} \sum_{t \in T} I(t,u,v) p(t) \end{equation} In any of the possible random forests, a fixed node $u$ is always in exactly one tree. So we have \begin{equation} \sum_{t \in T} I(t,u) p(t) = 1 \end{equation} There are $n$ nodes in the forest, so summing over all possible nodes gives \begin{equation} \sum_{u} \sum_{t \in T} I(t,u) p(t) = \sum_{u} 1 = n \end{equation} Similar argument (which I am struggling to express) gives \begin{equation} \sum_{t \in T} I(t,u,v) \times p(t) = {\rm probability}~u~{\rm and}~v~{\rm are~in~same~connected component} \end{equation} And so we have the required result \begin{equation} E = n + \sum_{u,v, u \ne v} {\rm probability}~u~{\rm and}~v~{\rm are~in~the~same~connected~ component} \end{equation} The sum is over ordered pairs $(u,v)$ with $u \ne v$. If considering unordered pairs, there is an extra factor of 2. answered 13 Mar, 04:53
"In any of the possible random forests, a fixed node uu is always in exactly one tree. So we have ∑t∈TI(t,u)p(t)=1" Could u please explain this line?
(13 Mar, 08:38)
And just for clearing my doubt: If tree is
Then T={{1,2},{1,3},{2,3},{1},{2},{3},{1,2,3}} right?
(13 Mar, 08:43)
$\{2,3\}$ do not form a tree so should not be in $T$.
(13 Mar, 13:02)
Yeah sorry,can u answer my first question plz?
(13 Mar, 16:18)
Part 1: We started with a single tree, and randomly decomposed it into a forest of trees. First key observation is that \begin{equation} p_1 + p_2 + \cdots p_k = 1 \end{equation} Second observation is that if we fix a node $u$, then in each forest there is just one tree containing node $u$.
(14 Mar, 02:47)
Part 2: Let $t$ be a tree. Then $p(t)$, the probability of $t$ being in the resulting random forest, is given by the sum of the $p_i$
for all the forests $f_i$ containing $t$. Now fix a node $u$ and sum over different trees containing $u$ \begin{equation} \sum_{t \in T~{\rm where}~t~{\rm contains}~u} p(t) = \sum_{t \in T~{\rm where}~t~{\rm contains}~u~~} \sum_{i~{\rm with}~t~{\rm in}~f_i} p_i \end{equation}
(14 Mar, 02:48)
1
Because $I(t,u) = 1$ when $t~{\rm contains}~u$, and $0$ otherwise, the left hand side is the same as \begin{equation} \sum_{t \in T} I(t,u) p(t) \end{equation} For the right hand side, we ask ourselves how many times does each $p_i$ appear in the combined sum? Answer,
once for every tree in $f_i$ containing node $u$. So the right hand side simplifies to \begin{equation} \sum_{i=1 \ldots k} p_i = 1 \end{equation} which is the result we require.
(14 Mar, 02:48)
Thank you so much!! Wished i could upvote you 1 more time.
(14 Mar, 07:52)
showing 5 of 8
show all

I did the convolution in pypy 2 in a bit unusual way. Python can handle multiplication of arbitrary sized integers, for example numbers like $64 \cdot 10^{10^5}$. Convolution can be expressed as integer multiplication, so why not just use python's built in integer multiplication. From this idea I made my own fast convolution using big integers, which got accepted. How I love python =). Unfortunately this convolution was only able to get me 60 points on the CHEFKNN problem. Had to use C++ to fully solve that one. Link to accepted pypy2 solution to CUTTREE Link to 60points pypy2 solution to CHEFKNN Here is the code, doing fast convolution with big integers: answered 13 Mar, 05:54
Wow! I could never imagine an FFT problem pass in Python. Great stuff. Especially for this problem, getting it to pass in C++ was not straightforward, at least for me.
(13 Mar, 12:18)

Same approach except for the fact that I used karatsuba for all the polynomial multiplications, barely passes all the test cases with all the optimizations. answered 13 Mar, 18:26
