I was astonished after seeing this problem as this problem basically asks to calculate the convolution of the given polynomial with a constrained K. I know about fast fourier transform to multiply two polynomial in O(nlogn) but there we used the idea of nth roots of unity in place of some random K, to reduce the input size by half in each level of recurrence. But, if we are constrained to choose a paricular K, how to sample N points in better than O(N^2)!!! Also what does it means to "design" a DFT which works on some particular Modulo? asked 13 Jul '16, 15:41

Look at all possible remainders of $x^2$ modulo the given prime. How many different remainders exist? Do the same with $x^4$, $x^8$ etc. How many remainders exist? My approach uses this and it's almost brute force. answered 13 Jul '16, 16:05

I did not use FFT / NTT in this problem. I used similar insight from @xellos0 's insight. I recursively decomposed a polynomial into four polynomials in terms of $x^4$ and used unordered_map to save the result for each decocmposition. I continued decomposition until I only have one term (constant term). The number of possible remainders when the function $x^4 mod 786433$ is repeatedly applied is reduced drastically every application which gives an opportunity for memoization / DP. I don't know how to prove this mathematically, but I tried creating a program which counts the number of possible remainder and indeed this is true for powers of two. Notice that the decomposition would produce a tree with four childs, and so I used 4ary heaplike indexing. https://www.codechef.com/viewsolution/10762921 You can decompose a polynomial into four polynomials in this way: $A(x) = A_{4k}(x^4) + x A_{4k+1}(x^4) + x^2 A_{4k+2}(x^4) + x^3 A_{4k + 3}(x^4)$ IN the formula above, $A_{4k}(x^4)$ are the coefficients that are multiples of $4$, $A_{4k+1}(x^4)$ are the coeffiients that are multiples of $4$ but $+1$. We can also decompose a polynomial into $8$ polynomials in terms of $x^8$. I think I would have gotten faster running time if I used higher power such as $8$ because the number of remainders reduce even faster. I hope the logic is sufficiently understandable, let me know if there is something unclear with my explanation. answered 14 Jul '16, 01:29

A better solution with O(n (log n)**2) is given here: https://www.student.cs.uwaterloo.ca/~cs487/handouts/script07.pdf I tried implementing this, however it was tle probably due to higher constants in multiplication in NTT/FFT. It was taking around 12 seconds for the worst case _ answered 13 Jul '16, 15:46

Please look here http://emaxx.ru/algo/fft_multiply if you don't speak Russian (like me) use Google Translate. answered 13 Jul '16, 19:52

I have discussed my approach in this page : https://discuss.codechef.com/questions/82993/workchefandpolyevalproblemsinjuly16?page=1#83020 answered 13 Jul '16, 20:38

I have shared my approach on this page: https://discuss.codechef.com/questions/82993/workchefandpolyevalproblemsinjuly16?page=1#83020 answered 13 Jul '16, 20:40
