PROBLEM LINK:Author: Sergey Kulik DIFFICULTY:HARD PREREQUISITES:Binaryindexed trees (BITs, Fenwick trees) with coordinate compression. PROBLEM:There's a tree with weighted vertices. You may split it into paths as long as the sum of weights on each path is nonnegative. Find the number of ways to do that. QUICK EXPLANATION:Tree DP. Root the tree; for each subtree, store downward paths with the number of ways to make that path in BITs. When processing a vertex $v$, take the BIT of its largest son, make it the BIT of $v$ add paths through the remaining vertices in the subtree of $v$ to it, recomputing the answer for that subtree. EXPLANATION:Note: the number of ways to make everything is computed modulo the given prime $MOD$. For clarity, we won't write it everywhere. We'll consider the tree rooted at vertex 1. This was most definitely the toughest problem of the contest. While it doesn't require any standard advanced tree algorithms, the optimal solution does something that resembles HLD. But before describing it, let's look at a slower algorithm first.
Since the tree is rooted, each vertex $v$ is the root of a subtree $T_v$. We'll compute the answers to the problem of only decomposing $T_v$ into paths, $ans(v)$. The standard approach is bottomup DP on a tree  let's suppose that we've already recursively computed the answers for all subtrees of $T_v$ except $ans(v)$ and now want to compute $ans(v)$. Since $v$ is the root of $T_v$, the path through it can only continue down, through one or two of its sons. The first thing we can do is DFS down through subtrees of all sons of $v$; for any son $s$ and each vertex $u$ in $T_s$, we can calculate not only the sum on the path from $v$ to $u$, but also the number of ways to decompose $T_s$ into paths such that there's an "unfinished" one from $s$ to $u$, which will be the product of answers of all subtrees "hanging" from that path, because we're free to decompose each of them independently. As we go down in the DFS, we can compute those values  if we're at $R$, we have the sum $sum_p$ and the product $A$ of answers of all subtrees hanging from that path above $R$. Recomputing $sum_p$ is trivial. If $u=R$, then the number of ways to decompose the rest of $T_s$ is $A\cdot P_A$, where $P_A$ is the product of answers of all sons of $R$. Otherwise, we can go down to each son $S_R$ of $R$, multiplying $A$ by only the product of answers of all other sons. We can't afford to loop over all sons to compute this product, though (what if $T_R$ was a star?). Therefore, we'll utilise primality of the modulo, which allows us to compute $P_A/ans(S_R)$ using modular inverses of answers; for efficiency, we can compute the modular inverse of each $ans(v)$ just once and remember them. However, the required modular inverse doesn't exist if $ans(S_R)=0\%MOD$. One way to resolve this is remembering the $P_A$ as the pair (product of nonzero answers of sons, number of zero answers of sons), the other is treating $P_A=0$ as a special case where the DFS has to go down to only one $S_R$ with $ans(S_R)=0$, and the third is computing products of answers from prefix and suffix products. Either way, for a subtree with $V$ vertices, we'll obtain a list of $V$ pairs $(sum,ways)$ in $O(V)$ time. Now, we want to get the answer for $v$ from them. In order to avoid cases like separately considering paths which go to 0, 1 and 2 sons of $v$, counting paths which go to 2 sons twice and trying to send a path twice down to the same son (which isn't a path at all), we'll process the sons of $v$ in some order and remember the already processed part; initially, the processed part is just $v$. When processing $T_s$, we'll try to merge each unfinished path up to $s$ with all allowed unfinished paths up to $v$ in the processed part  including a onevertex path of only $v$, which accounts for the case of paths going down to only 1 son from $v$. Afterwards, we should add $T_s$ to the processed part  go over all pairs $(sum,ways)$, update and add each of them somehow (we'll get to that later). What exactly do we need to do when processing an unfinished path $(sum_1,ways_1)$ in $T_s$? We need to find all pairs $(sum_2,ways_2)$ in the processed part such that $sum_2+sum_1 \ge 0$, sum up $ways_2$ for them, multiply by $ways_1$ and the product of answers of all sons of $v$ not used for either pair and add it to $ans(v)$. Don't forget that we need to add some pairs to the processed part; for segment sums / point updates, the ideal structure is a segment tree or binaryindexed tree. There's one small complication  the sums can be really big. Segment trees can be written in a way that allows even such sums to be added directly (see problem Game from IOI 2013). For BITs, that's not the case, but we have a simpler workaround: just simulate everything first without counting ways to compute anything (this means we don't need to sum up $ways_2$, so no BIT is necessary yet) and remember which sums we actually had BIT queries or updates for in a map<>; then, compress them to $0,1,2,..$; if those numbers are assigned in nonincreasing order, we'll also solve the problem of BITs naturally answering sums of $\le$ something and not $\ge$ (as we need here). The next question is: how to compute the product of answers of all sons not in either unfinished part efficiently? While we could try to compute the product of all sons and divide the number of ways for each unfinished path by the answer of the son to which it goes, once again, the problem of some $ans(s)=0$ arises. It can still be solved by casework, but it's quite complicated and there's a better way. We'll add the pairs $(sum_1,ways_1)$ to the processed part in a way that incorporates the product of answers of all necessary processed sons already. For any son $s$, let $P_r$ be the product of answers of all sons processed before $s$ and $P_l$ the product for sons to be processed after $s$. When computing the answer for $(sum_1,ways_1)$, we'll only have to sum up $ways_2$ and multiply that sum by $P_l$. After processing all those pairs for $s$, we'll multiply everything in our BIT by $ans(s)$ and add $(sum_1+val_v,ways_1\cdot P_r)$. Multiplying everything in the BIT (by some $m$) at once seems strange. It can be implemented efficiently if we remember $m$ itself and when we want to add $(a,b)$ to the BIT, we'll add $(a,b/m)$ instead; the queried sums have to be multiplied by $m$ instead. Division by $m$ is handled using modular inverses; if $m=0$, then we can just "clear" the BIT by checking all indices visited in it so far and zeroing the numbers in them; afterwards, it's fine to set $m=1$. Since all $P_l$ and $P_r$ can be computed easily, we have all that's necessary to compute $ans(v)$ using a classic tree DP approach.
The above algorithm worked in $O(N^2\log{N})$ time worstcase (what if the tree is a path?). However, we've made a big progress there  in particular, we have a way to process subtrees of sons and add them to the processed part quite efficiently. The only inefficient part is at the start, when we're adding a large subtree to a processed part that only consisted of a single vertex $v$. Wouldn't it be better to reverse it and only "add" one vertex to a large subtree in an efficient way? If this can be done fast, then the best son's subtree to add $v$ to is the largest one. Instead of DFSing through the whole $T_v$, we'll take one of its sons $s_m$ with the largest subtree (the situation where $v$ is a leaf is trivial), then take the BIT with pairs $(sum,w)$ that we had when $v$ was $s_m$, count the number of ways to decompose $T_v$ when the path through $v$ goes only down to $s_m$, note that all $sum$s in it are actually bigger by $val_v$ and all numbers of ways in them are multiplied by the product of answers of subtrees of the remaining sons of $v$ and declare it the BIT for $v$ after processing $T_{s_m}$. Since counting the number of paths is just a single query on the BIT and multiplying by the answers of subtrees of all other sons, it takes an amortised time of $O(\log{N})$  as opposed to the original $O(T_{s_m}\log N)$. This is where the similarity with HLD lies: it's as if we didn't have BITs for vertices, but for HLDpaths and updated them as we moved up the paths (of course, the concepts of heavy and light edges is completely redundant here). Another outlook is that we're starting with the processed part = the largest son's subtree and merging smaller subtrees into it. Why is this faster and just how fast is it? Well, since we have some $O(\log{N})$ time per BIT updates/queries (plus $O(1)$ in the DFS) per vertex everytime, we just need to find out how many times each vertex $u$ appears in subtrees of sons other than $s_m$. The answer is fairly wellknown: whenever it happens for some $v$ on the path from $u$ to the root that $u$ is in the subtree of some son $s_s$ of $v$, then $T_v > T_{s_m}+T_{s_s} > 2T_{s_s}$, so there can't be more than $\log_2{N}=O(\log{N})$ such vertices $v$ or the whole tree would need to have more than $N$ vertices; that means we'll visit a total of $O(N\log{N})$ vertices $u$ this way. Add to it some trivialities like computing modular inverses in $O(\log{MOD})$ and we see that the whole algorithm takes a fabulous $O(N\log^2{N})$ time. The implementation is basically the same as before, the only thing to fix is compression in BITs. We don't know what values will be added to each BIT, so there's no other choice but to run the whole algorithm without counting answers for subtrees first, simulating the computations of all sums for all vertices $v$ and the queries/updates on BITs. As before, we only need to put the queried/updated indices in map<>s. We've already described how to add pairs $(sum,ways)$ into a BIT with a note that all numbers in it are multipled by $m$. Adding another note that all indices are shifted by $x$ is even easier, we just have to add $(sumx,ways/m)$. The shifts by $x$ need to be handled in the initial run for compression, too. AUTHOR'S AND TESTER'S SOLUTIONS:The author's solution can be found here.
This question is marked "community wiki".
asked 14 Nov '15, 19:59

xellos0 : I did not understand this solution properly can anyone please help me. answered 10 Jan '16, 09:27
The solution is long, which part did you not understand? Seeing as you're unrated, it's not yet time to jump to hardest problems.
(11 Jan '16, 00:16)
Hey what i didnot understood 1) As in editorial the no of ways to find is A.PA, But how are we calculating PA Efficently. How are we calculating no of ways for each node.
(19 Jan '16, 22:43)
