### PROBLEM LINK:

**Author:** Sergey Kulik

**Tester:** Istvan Nagy

**Editorialist:** Xellos

### DIFFICULTY:

HARD

### PREREQUISITES:

Binary-indexed trees (BITs, Fenwick trees) with coordinate compression.

### PROBLEM:

There’s a tree with weighted vertices. You may split it into paths as long as the sum of weights on each path is non-negative. Find the number of ways to do that.

### QUICK EXPLANATION:

Tree DP. Root the tree; for each subtree, store downward paths with the number of ways to make that path in BITs. When processing a vertex v, take the BIT of its largest son, make it the BIT of v add paths through the remaining vertices in the subtree of v to it, recomputing the answer for that subtree.

### EXPLANATION:

Note: the number of ways to make everything is computed modulo the given prime MOD. For clarity, we won’t write it everywhere.

We’ll consider the tree rooted at vertex 1.

This was most definitely the toughest problem of the contest. While it doesn’t require any standard advanced tree algorithms, the optimal solution does something that resembles HLD. But before describing it, let’s look at a slower algorithm first.

- Tree DP

Since the tree is rooted, each vertex v is the root of a subtree T_v. We’ll compute the answers to the problem of only decomposing T_v into paths, ans(v).

The standard approach is bottom-up DP on a tree - let’s suppose that we’ve already recursively computed the answers for all subtrees of T_v except ans(v) and now want to compute ans(v). Since v is the root of T_v, the path through it can only continue down, through one or two of its sons. The first thing we can do is DFS down through subtrees of all sons of v; for any son s and each vertex u in T_s, we can calculate not only the sum on the path from v to u, but also the number of ways to decompose T_s into paths such that there’s an “unfinished” one from s to u, which will be the product of answers of all subtrees “hanging” from that path, because we’re free to decompose each of them independently.

As we go down in the DFS, we can compute those values - if we’re at R, we have the sum sum_p and the product A of answers of all subtrees hanging from that path above R. Recomputing sum_p is trivial. If u=R, then the number of ways to decompose the rest of T_s is A\cdot P_A, where P_A is the product of answers of all sons of R. Otherwise, we can go down to each son S_R of R, multiplying A by only the product of answers of all *other* sons. We can’t afford to loop over all sons to compute this product, though (what if T_R was a star?). Therefore, we’ll utilise primality of the modulo, which allows us to compute P_A/ans(S_R) using modular inverses of answers; for efficiency, we can compute the modular inverse of each ans(v) just once and remember them. However, the required modular inverse doesn’t exist if ans(S_R)=0\%MOD. One way to resolve this is remembering the P_A as the pair (product of non-zero answers of sons, number of zero answers of sons), the other is treating P_A=0 as a special case where the DFS has to go down to only one S_R with ans(S_R)=0, and the third is computing products of answers from prefix and suffix products. Either way, for a subtree with V vertices, we’ll obtain a list of V pairs (sum,ways) in O(V) time.

Now, we want to get the answer for v from them. In order to avoid cases like separately considering paths which go to 0, 1 and 2 sons of v, counting paths which go to 2 sons twice and trying to send a path twice down to the same son (which isn’t a path at all), we’ll process the sons of v in some order and remember the already processed part; initially, the processed part is just v. When processing T_s, we’ll try to merge each unfinished path up to s with all allowed unfinished paths up to v in the processed part - including a one-vertex path of only v, which accounts for the case of paths going down to only 1 son from v. Afterwards, we should add T_s to the processed part - go over all pairs (sum,ways), update and add each of them somehow (we’ll get to that later).

What exactly do we need to do when processing an unfinished path (sum_1,ways_1) in T_s? We need to find all pairs (sum_2,ways_2) in the processed part such that sum_2+sum_1 \ge 0, sum up ways_2 for them, multiply by ways_1 and the product of answers of all sons of v not used for either pair and add it to ans(v). Don’t forget that we need to add some pairs to the processed part; for segment sums / point updates, the ideal structure is a segment tree or binary-indexed tree.

There’s one small complication - the sums can be really big. Segment trees can be written in a way that allows even such sums to be added directly (see problem Game from IOI 2013). For BITs, that’s not the case, but we have a simpler workaround: just simulate everything first without counting ways to compute anything (this means we don’t need to sum up ways_2, so no BIT is necessary yet) and remember which sums we actually had BIT queries or updates for in a *map<>*; then, compress them to 0,1,2,..; if those numbers are assigned in non-increasing order, we’ll also solve the problem of BITs naturally answering sums of \le something and not \ge (as we need here).

The next question is: how to compute the product of answers of all sons not in either unfinished part efficiently? While we could try to compute the product of all sons and divide the number of ways for each unfinished path by the answer of the son to which it goes, once again, the problem of some ans(s)=0 arises. It can still be solved by casework, but it’s quite complicated and there’s a better way.

We’ll add the pairs (sum_1,ways_1) to the processed part in a way that incorporates the product of answers of all necessary processed sons already. For any son s, let P_r be the product of answers of all sons processed before s and P_l the product for sons to be processed after s. When computing the answer for (sum_1,ways_1), we’ll only have to sum up ways_2 and multiply that sum by P_l. After processing all those pairs for s, we’ll multiply everything in our BIT by ans(s) and add (sum_1+val_v,ways_1\cdot P_r).

Multiplying everything in the BIT (by some m) at once seems strange. It can be implemented efficiently if we remember m itself and when we want to add (a,b) to the BIT, we’ll add (a,b/m) instead; the queried sums have to be multiplied by m instead. Division by m is handled using modular inverses; if m=0, then we can just “clear” the BIT by checking all indices visited in it so far and zeroing the numbers in them; afterwards, it’s fine to set m=1.

Since all P_l and P_r can be computed easily, we have all that’s necessary to compute ans(v) using a classic tree DP approach.

- Speeding up

The above algorithm worked in O(N^2\log{N}) time worst-case (what if the tree is a path?). However, we’ve made a big progress there - in particular, we have a way to process subtrees of sons and add them to the processed part quite efficiently. The only inefficient part is at the start, when we’re adding a large subtree to a processed part that only consisted of a single vertex v. Wouldn’t it be better to reverse it and only “add” one vertex to a large subtree in an efficient way? If this can be done fast, then the best son’s subtree to add v to is the largest one.

Instead of DFS-ing through the whole T_v, we’ll take one of its sons s_m with the largest subtree (the situation where v is a leaf is trivial), then take the BIT with pairs (sum,w) that we had when v was s_m, count the number of ways to decompose T_v when the path through v goes only down to s_m, note that all sum-s in it are actually bigger by val_v and all numbers of ways in them are multiplied by the product of answers of subtrees of the remaining sons of v and declare it the BIT for v after processing T_{s_m}. Since counting the number of paths is just a single query on the BIT and multiplying by the answers of subtrees of all other sons, it takes an amortised time of O(\log{N}) - as opposed to the original O(|T_{s_m}|\log N). This is where the similarity with HLD lies: it’s as if we didn’t have BITs for vertices, but for HLD-paths and updated them as we moved up the paths (of course, the concepts of heavy and light edges is completely redundant here). Another outlook is that we’re starting with the processed part = the largest son’s subtree and merging smaller subtrees into it.

Why is this faster and just how fast is it? Well, since we have some O(\log{N}) time per BIT updates/queries (plus O(1) in the DFS) per vertex everytime, we just need to find out how many times each vertex u appears in subtrees of sons other than s_m. The answer is fairly well-known: whenever it happens for some v on the path from u to the root that u is in the subtree of some son s_s of v, then |T_v| > |T_{s_m}|+|T_{s_s}| > 2T_{s_s}, so there can’t be more than \log_2{N}=O(\log{N}) such vertices v or the whole tree would need to have more than N vertices; that means we’ll visit a total of O(N\log{N}) vertices u this way. Add to it some trivialities like computing modular inverses in O(\log{MOD}) and we see that the whole algorithm takes a fabulous O(N\log^2{N}) time.

The implementation is basically the same as before, the only thing to fix is compression in BITs. We don’t know what values will be added to each BIT, so there’s no other choice but to run the whole algorithm without counting answers for subtrees first, simulating the computations of all sums for all vertices v and the queries/updates on BITs. As before, we only need to put the queried/updated indices in *map<>*s.

We’ve already described how to add pairs (sum,ways) into a BIT with a note that all numbers in it are multipled by m. Adding another note that all indices are shifted by x is even easier, we just have to add (sum-x,ways/m). The shifts by x need to be handled in the initial run for compression, too.

### AUTHOR’S AND TESTER’S SOLUTIONS:

The author’s solution can be found here.

The tester’s solution can be found here.

The editorialist’s solution can be found here.