PRIMEDST - Editorial

PROBLEM LINKS

Practice
Contest

DIFFICULTY

HARD

PREREQUISITES

Fast Fourier Transform, Graph Theory

PROBLEM

You are given a tree with N nodes and exactly N-1 edges.

Find the number of pairs of nodes in the tree for which the length of the path (number of arcs between the nodes) between them is prime.

note

The problem actually asks for the probability that selecting two such nodes leads to a prime length path. But it is obvious that what is needed is the number of such pairs. The answer will just be

number-of-pairs / NC2

EXPLANATION

The explanation of the solution derives ideas largely from the Tester’s Solution.


The intended solution for this problem was based primarily on using FFT for convolution. I find this idea extremely elegant (and usually, much simpler, because its very generic).

I have experience with setting / testing very few problems using this central idea in various contests (3 of them on CodeChef) and always, the test data seems to have been cracked by a different - no, I don’t mean simpler, just different - approaches.

I say cracked, because these approaches might fail particular corner cases, which were not thought off during testing. I believe that generating the test data that would fail every cracky approach is very hard. Sometimes, an approach might not be possible to fail within the constraints.

This is probably the case in August Challenge. It would be interesting to know whether it was possible to fail the alternate approaches, or not!

ignore the prime numbers

The problem asks you to find the number of pairs at a distance of prime numbers. But, there are no elegant properties that you can use to be able to simplify the problem. Our objective will remain simpler.

  • For each node, find the number of nodes at all distances

In the end, we will only add up the number of nodes at prime distances

divide and conquer

  • Root the tree at some vertex v. This orients all the edges and defines a clear parent / child relationship between nodes
  • For each child of v, run the algorithm recursively on the subtree rooted at it
  • The sub-problems (sub-trees) will be of recursively smaller and smaller sizes

Now, for each child c of v, we know how many decendents are at each distance. We can update the values to be stored at v by adding 1 to the values from child c. Of course, values at v will be composed using all the values from all the children of v.

This takes care of all the paths that start at a node and always move downwards, towards the decendents.

For all the other paths, we can do a convolution of the arrays of values stored at each child of v. Of course, the fastest way to convolve is using Fast Fourier Transforms for polynomial multiplication. This idea has been explored in the July Challenge problem FARASA as well.

binarize

The given tree is not necessarily a binary tree. If we try to find a split of nodes in such a tree, we may end up with a split that selects only O(N / D) vertices in one set; here D is the largest degree of any node.

This would mean that our divide and conquer may perform too many FFTs in the worst case. This is highly undesirable.

If a node has more than 2 chilren

  • Give the node a new right child
  • Connect the node to its right child with a 0 cost edge
  • Move all but 1 child to the children of the new right child
  • The remaining child will be the left child of the node

In the final tree, the meaning of distance will be replaced by the sum of the costs of edges, instead of the number of edges.

Now, we can always find a split that selects at least O(N / 3) nodes in one set.

complexity

The complexity of the solution is O(N log2 N).

But, the constants for this solution are large. The Tester’s and Setter’s solutions based on the described approach are still slower than several solutions submitted during the contest.

SETTER’S SOLUTION

Can be found here.

TESTER’S SOLUTION

Can be found here.

12 Likes

Hi, great question indeed!
I also solved it using fft, but can you please describe the other method?

1 Like

I did it using DFS and flattening of tree, with complexity of N*P, where P is the number of primes till N.

http://www.codechef.com/viewplaintext/2521386



Here is the explanation to my algorithm:

Devising the algo:

Looking at the data, we see that N is a maximum of 50,000. Also, Let us denote the number of primes up till N with P. P can be a maximum of 5133. My algorithm is such that it will visit each of the N nodes a maximum of P times. If the tree has branches, the maximum depth or the levels that the tree will have, will be less than N. Otherwise, if tree is without branches, it will have N levels. So, on an average, the algo has complexity less than N*P, as worst case isn’t really attained.

The ALGORITHM:

First of all, I take an array PRM, which stores primes up till 50,000. Let us include 1 in the list as well. So, PRM[1]=1. PRM[2]=2. PRM[3]=3. PRM[4]=5…PRM[5134]=49999. Next, I take an array NOP, in which NOP[i] stores the index of that particular prime which is greater than or equal to i. So, NOP[4]=4 as the prime just greater than 4 is 5, and 5 is indexed at 4 in the array PRM.

Now let’s take the following tree:

![alt text][1]
[1]: http://discuss.codechef.com/upfiles/Tree.jpg

My algorithm roots the tree at node 1. Next I do a recursive DFS till I reach the leaves of the tree, i.e., 9, 6, 7, 8, 4.
So, now I take a bottom up approach. For the leaves, I just take a new array(vector) of length 1 and put the value 1 in it. Then I return this array. For the other nodes, say node 5, my DFS call to node 9 has returned this me an array of length 1. So I extend the array to length of 2, and add the value 1 to it. So now the array contains: 1 1. I added 1 because 5 is a single node at this level. Now I reach node 2. My recursive call to node 5 returned me an array 1 1, call to 6 and 7 returned arrays of unit lengths 1 containing 1. I take the longest array, i.e., the one I got from call to 5, and extend it by 1 unit and add a value 1 to it. So the array looks like 1 1 1.
Now I take the other array returned from call to 6 and for each element in this array, I check how many nodes are at a prime distance from it in the array 1 1 1. You can visualize it as if I have placed the two arrays side by size to look like this: 1 1 1 - 1, where the ‘-’ demarcates two arrays.
So, the 2nd ‘1’ in the first array is at a horizontal distance 2 from it. Thus, I increase my count of total nodes at a prime distance by 1. Next I join the two arrays and the final array looks like 1 2 1. The second element became 2 because node 5, 6 are at the same level. So I merged the two branches. Now I take the third branch and place it horizontally with the previous array. It now can be visualized as: 1 2 1 - 1. So two node are at a horizontal distance from the only element in 2nd array. So I add 2 to my count. I merge the two branches to get 1 3 1. I have no more branches so I return this array.

This is how my algorithm calculates number of pairs at prime distance. The purpose of storing NOP array is that when I am checking for horizontal distance say in such a case, where first array is 1 3 1 2 1 and second is 1 1 1 1 2. So, when I reach element 4 of second array, I only need to check for number of nodes at distance more than 4. Hence, I need to know the prime just larger than 4. This saves several operations.

Hope that helped.

14 Likes

hey can anyone give a link to a good tutorial on fft…

1 Like

In this constraints it can be solved in O(n^2). But there are two tricks, first - when we merge subrees, we should merge small subtree to big one to avoid MLE. second - we compute these values in reversed order, so we can update subtree by adding 0 to the end of vector. The worst case for my solution is a chain, and root is in the middle of it. But it’s complexity can be decreased to O(nk), where k is the number of intersting numbers. CodeChef: Practical coding for everyone

2 Likes

I also used FFT to convolve subtrees. The important thing in my code is not to use an arbitrary node as root, but use a root which minimizes the largest subtree (such a node can be found in O(n)). There always exists a node such that the largest subtree has at most ceil(n/2) nodes. This way, one can guarantee O(n log2 n) time.

1 Like

dfs approach seems much easier than FFT…

Can anyone give a link to FFT tutorial, in english,short and clear ?

Could you briefly explain the concept ?

3 Likes

Perhaps the problem setter could have used have used a denser series than prime numbers, for example, x s.t. mobius(x) = -1.

4 Likes

Yeah, it will be of great help if you can explain us the concept.

MAXimal :: algo :: Быстрое преобразование Фурье за O (N log N). Применение к умножению двух полиномов или длинных чисел maybe this

3 Likes

How can an N^2 solution pass? I did something similar to you but with complexity N*P, where P= number of primes till N. But how did you get an N^2 solution passed?

As I said above, the worst case is a chain, so we make about 25000*25000=6 10^8 operations, which should be ok with time limit 5secs(at least on my laptop it worked less one second)

well the time limit was too lenient. N^2 is too naive an algo for a question in the hard category…

I think your method is quite similar to the one explained in the editorial. The editorial uses FFT to ‘merge’ the two arrays at a particular node.

^That true. I believe the FFT implementation entices undue efficiency!

1 Like

Wouldn’t naive merging take O(n^2) time?

No, if you do by fft, merging is nlogn, and if you do it by the method I described above, then it is for any subtree, merging is (K x T), where K=number of nodes of subtree and T= Maximum depth amongst all the branches. For the whole tree it isnt N^2 because, nodes in two branches, which are at the same level, are merged in constant time, regardless of the number of nodes that are there. So, for the whole tree, the complexity is again (depth x number_of_nodes). Merging a binary tree takes NlogN. Similarly, this merging wont degenerate to N^2.

1 Like

If I have understood your method correctly, a ‘merge’ requires element-wise addition of arrays. How do you perform this in constant time? And isn’t (depth x number_of_nodes) O(n^2).