×

COMPCLUB - Editorial

Author: Ramazan Rakhmatullin
Testers: Lewin Gan, Marek Sommer
Editorialist: Pawel Kacprzak

Medium

PREREQUISITES:

Trees, dynamic programming, DFS

PROBLEM:

The problem can be reformulated as follows:

There is given a tree $T$ with $N$ nodes numbered from $0$ to $N-1$, where the $i$-th node has a color $color(i)$ and level $level(i)$. The goal is to find for each node $v$ the number of non-empty sequences of nodes: $v_0, v_1, \ldots, v_k$ such that:

1. $v_0 = v$, i.e. the sequence starts in $v$
2. All $v_i$ in the sequence have the same color
3. For $i < k$, $v_i$ is ancestor of $v_{i+1}$ in the tree
4. $level(v_k) = 0$
5. For each $i < k$, $level(v_i) = level(v_{i+1}) + 1$

We call each such sequence a good sequence.

EXPLANATION:

First of all, in the input, there is given an upper limit on maximum level, but we're going to ignore completely this variable.

In the first subtask, the sum of $N$ over all test cases is at most $5000$. Also, there is an upper limit of $20000$ for the sum of numbers in the output, but we won't use it in the below solution.

The general idea is to use dynamic programming on the tree. Just for a minute let's forget about efficiency and just defined the most straightforward recursive equation for the tree.

Let $dp[v][c][k]$ be the number of good sequences of length $k$, color $c$ in $v$'s subtree. Let's fix node $v$, and let $result[v]$ be the number of good sequences starting in $v$, i.e. the result for $v$, and $count(c)$ be the number of nodes with color $c$.. Then both $result[v]$ and $dp[v][c][k]$ for each $c$ and $k$ can be computed in the following steps:

1. For any $c$, $dp[v][c][0] = 0$ just by the definition of a good sequence.
2. For every color $c$ and any $k$, $dp[v][c][k]$ can be initiallized as $\sum\limits_{u \in children(v)} dp[u][c][k]$.
3. If $level(v) = 0$, then $result[v] = 1$, because the only good path starting in $v$ also has to ends in $v$.
4. If $level(v) \geq count(color(v))$ then $result[v] = 0$ because there are not enough nodes of color $color(v)$ to form a good sequence starting in $v$.
5. Otherwise, $level(v) > 0$, so $result[v] = \sum\limits_{u \in children(v)} dp[u][color(v)][level(v)]$. because any good path of color $color(v)$ of length $level(v)$ can be extended by $v$ to a good path of length $level(v)+1$.
6. Finally, we add $result[v]$ to $dp[v][color(v)][level(v)+1]$

Ok, but what are the time and memory complexity of this method. Well, there are $N$ nodes, at most $N$ colors and the maximum length of a sequence can be at most $N$, so it looks like the $dp$ table has to have $N^3$ entries. However, for a fixed color $c$, the maximum length of a sequence of color $c$ is at most $count(c)$. This is true because each two consecutive elements of a sequence have to have consecutive levels and the last node has to have level $0$. Thus, the $dp$ array, in fact, has at most $O(N^2)$ entries. If this approach is implemented with a DFS, then each entry $dp[v][c][k]$ is passed only a constant number of times to its parent, and because there are $O(N^2)$ such entries, the total time complexity is $O(N^2)$.

The approach for the first subtask given in this editorial should also pass for the second subtask if only rather than passing everywhere $dp[v][c][k]$ for every color $c$, we do it only for colors occurring in the input. Constraints for this subtasks are pretty much similar, so perhaps there exist other slight improvements over the method from the first subtask.

In the third subtask, $N$ is at most $10^5$ and sum over $N$ in all test cases is at most $5 \cdot 10^5$. These constraints are large enough to prevent any quadratic time approaches. One idea for a solution passing them, which should be too slow to pass the final subtask, is to create for each color $c$ a compressed tree containing only nodes of color $c$. A compressed tree here means that if there are $count(c)$ nodes of color $c$, then such a tree should contain $O(count(c))$ nodes and for any two nodes $u$ and $v$ of colors $c$ if $u$ is an ancestor of $v$ in the original tree then it's also an ancestor of $v$ in the compressed tree. All such compressed trees can be created using LCA algorithm in $O(N \cdot \log(N))$ time. After that, dynamic programming approaches can be used to find the results separately for each compressed tree.

In the last subtask, $N$ can be up to $5 \cdot 10^5$ and sum of $N$ over all test cases is at most $10^6$, which suggests that this subtask requires a really fast solution, especially considering that the time limit for this problem is 1 second. Fortunately, a really beautiful and easy to implement solution exists.

The general idea is to improve the complexity of the method described in the solution for the first subtask (which I suggest read first).

Let $count(c)$ be the number of nodes of color $c$ in the tree and $result[v]$ be the number of good sequences starting in $v$, i.e. the final result for $v$.

Now, let's assume that we run DFS over the tree staring in the root and that $dp[c][k]$ is the number of good sequences of color $c$ starting in nodes with level $k$ found so far in the tree, and we've just entered node $v$. Let's save the current value of $dp[color(v)][level(v)-1]$ into variable $s$ or assign $s := 0$ if $level(v) = 0$. Notice that $s$ now represents the value of $dp[color(v)][level(v)-1]$ before entering the $v$'s subtree. Next, let's run $DFS$ for all children of $v$ recursively. After all these recursive calls complete, results for all nodes in $v$'s subtree except $v$ are already computed. The only thing left to do for this subtree is to compute the result for $v$. There are a few cases to consider:

1. If $level(v) \geq count(color(v))$ then we know that $result[v] = 0$, because there are not enought nodes in the tree to for any good sequence starting in $v$.
2. Else if $level(v) = 0$, then $result[v] = 1$, because the only good sequence starting in $v$ ends also in $v$.
3. Otherwise, $1 \leq level(v) < count(color(v))$. Notice that after all recursive calls for children of $v$ finished, then $dp[c][k]$ counts the number of good sequences of nodes with color $c$ starting in nodes with level $k$ counted before DFS entered $v$'s subtree and also these ones counted while DFS was run for all children of $v$. Remembering what we saved to variable $s$ before running $DFS$ for children of $v$, $result[v]$ can be computed as the current value of $dp[color(v)][level(v)-1]$ minus $s$. After that, we just add the value $result[v]$ to $dp[color(v)][level(v)]$.

The total time complexity of this method is $O(N)$ because $dp$ table contains $O(N)$ entries and while DFS visits a single node it performs a constant number of operations not counting iterating over children of the current node and recursive calls.

For implementation details of this exact approach please refer to Marek's solution and Pawel's (mine) solution using the same variable names as the editorial.

AUTHOR'S AND TESTER'S SOLUTIONS:

Setter's solution can be found here.
First tester's solution can be found here.
Second tester's solution can be found here.
Editorialist's solution can be found here.

This question is marked "community wiki".

74484896
accept rate: 12%

19.6k349497539

Editorialist's solution is simple and understandable

(28 Jun '17, 22:29)

 2 Even though I wasn't able to solve this problem during the contest - I was really confident that the idea from subtask 3 could solve this problem. With a lot of submissions, small optimizations, I've passed the tests. I think that some tests could TLE with my approach and I challenge anyone with time to find a good testcase :D. What I did: my_solution build a tree with all nodes O(n) complexity dfs iterative (for optimization) - in this dfs I build a "compressed" tree for every color O(N) - (I don't know why the editorial state that this should take O(N * log N) for every tree make a dfs that returns for the subtree a map of {length - count} for the entire subtree merge the maps returned by every dfs for a child and obtain the full map for current node compute the solution with the idea from subtask 1 I have to admit that the "good" solution is amazing and this post is just to confirm that the idea from subtask 3 could pass the current tests. Can anyone provide a test case that will TLE? Thanks in advance. answered 27 Jun '17, 03:46 6★inseder 350●4●8●19 accept rate: 0% Same here, also did merging of the DP in O(n log n) which luckily passed the last case (link). Interested to find an input that hacks this for a TLE as well. (27 Jun '17, 16:30) hikarico5★
 1 Thank You :) Curious to know the solution answered 24 Jun '17, 22:48 61●5 accept rate: 0%
 1 Someone please explain the use of variable s, from pkacprzak's solution, I didn't get that. answered 25 Jun '17, 10:41 338●10 accept rate: 14%
 1 @tapanr97 I will try to explain in my words. First of all everything starts from the explanation of the first subtask. The result is obtained by filling up dp[v][c][k] with the appropriate values. This logic is really good but you need as explained in the editorial O(N^2). Now lets think of a node. While you run dfs when you get to a node you need the values for the children of level - 1. After you completed a node you can easily see that the values in the dp table for children is not need anymore. Now getting back to your question the solution relies on a way of getting rid of this extra memory. Let's assume that you get to a node where you have that dp[c][k] computed. before running the dfs for the children s = dp[c][k] = "the number of good sequences of color c starting in nodes with level k found so far in the tree" after you run dfs for a child u: dp[c][k] = s + x -> where x is the amount of good sequences of color c starting in nodes with level k in the subtree rooted in u going this for all children will result in: dp[c][k] = s + sum(x_u) for the result of the node v you need the sum of good sequences of color(v) starting with level(v) - 1 in the subtree of v - this is the dp'[c][l-1] - dp[c][l-1]. I noted dp' as the value of table after the dfs was run over the children of the node. This is what I understood. Please correct me if I am wrong. answered 28 Jun '17, 01:41 6★inseder 350●4●8●19 accept rate: 0%
 0 Can someone please tell me why my solution is giving TLE? I wrote the exact same code as @pkacprzak. Solution Link answered 25 Jun '17, 06:39 53●4 accept rate: 0% 1 In fact, the real reason is that you used std::endl to output new lines. Here is a submission with using explicitly "\n" new line character: https://www.codechef.com/submit/complete/14338613 You can read about the difference between these methods here: https://stackoverflow.com/a/213977/948918 In this problem the output is quite large and time limit very strict, so it's not a good idea to flush the buffer frequently. (25 Jun '17, 06:54) 1 Thank you for pointing out. I will keep it in mind. (25 Jun '17, 07:10)
 0 Can any one help me to find why i get TLE my code is here: http://ideone.com/s9OWbY my approach: dfs with inTime, outTime for every node then sort queries by (club, level) then get the sum using segment tree answered 25 Jun '17, 11:55 2★gasser 3●1 accept rate: 0%
 0 @chandyshot variable s is used to substract the the number of sequences having color c and level k which have not started with v answered 26 Jun '17, 14:20 1 accept rate: 0%
 0 can you please explain the role of variable s? Why we have minus it from the dp array to get the result? answered 26 Jun '17, 20:55 1 accept rate: 0%
 0 I couldn't understand the use of variable s. Why are we subtracting s from result[v]? Can anyone give proper explaination? answered 27 Jun '17, 23:06 4★tapanr97 16●1 accept rate: 25%
 0 Thank you...Now I got it @inseder answered 28 Jun '17, 11:42 4★tapanr97 16●1 accept rate: 25%
 0 If level(v)≥count(color(v))level(v)≥count(color(v)) then result[v]=0result[v]=0 because there are not enough nodes of color color(v)color(v) to form a good sequence starting in v..can anyone please explain this statement. answered 28 Jun '17, 13:26 1★sinha01 1 accept rate: 0%
 0 If level(v) ≥ count(color(v)) then result[v]=0 because there are not enough nodes of color color(v) to form a good sequence starting in v. can anyone please explain thsi statement? answered 28 Jun '17, 13:27 1★sinha01 1 accept rate: 0%
 0 what is the significance of variable s? answered 28 Jun '17, 20:54 1★sinha01 1 accept rate: 0%
 toggle preview community wiki:
Preview

By Email:

Markdown Basics

• *italic* or _italic_
• **bold** or __bold__
• image?![alt text](/path/img.jpg "title")
• numbered list: 1. Foo 2. Bar
• to add a line break simply add two spaces to where you would like the new line to be.
• basic HTML tags are also supported
• mathemetical formulas in Latex between \$ symbol

Question tags:

×15,182
×2,535
×2,020
×679
×48
×34
×10

question asked: 24 Jun '17, 10:58

question was seen: 2,061 times

last updated: 28 Jun '17, 22:29