I was astonished after seeing this problem as this problem basically asks to calculate the convolution of the given polynomial with a constrained K. I know about fast fourier transform to multiply two polynomial in O(nlogn) but there we used the idea of nth roots of unity in place of some random K, to reduce the input size by half in each level of recurrence. But, if we are constrained to choose a paricular K, how to sample N points in better than O(N^2)!!! Also what does it means to “design” a DFT which works on some particular Modulo?

A better solution with O(n (log n)**2) is given here: https://www.student.cs.uwaterloo.ca/~cs487/handouts/script07.pdf

I tried implementing this, however it was tle probably due to higher constants in multiplication in NTT/FFT. It was taking around 12 seconds for the worst case -_-

Look at all possible remainders of x^2 modulo the given prime. How many different remainders exist?

Do the same with x^4, x^8 etc. How many remainders exist?

My approach uses this and it’s almost brute force.

Please look here http://e-maxx.ru/algo/fft_multiply if you don’t speak Russian (like me) use Google Translate.

I have discussed my approach in this page :

https://discuss.codechef.com/questions/82993/workchef-and-polyeval-problems-in-july-16?page=1#83020

I have shared my approach on this page:

https://discuss.codechef.com/questions/82993/workchef-and-polyeval-problems-in-july-16?page=1#83020

I did not use FFT / NTT in this problem. I used similar insight from @xellos0 's insight. I recursively decomposed a polynomial into four polynomials in terms of x^4 and used unordered_map to save the result for each decocmposition. I continued decomposition until I only have one term (constant term). The number of possible remainders when the function x^4 mod 786433 is repeatedly applied is reduced drastically every application which gives an opportunity for memoization / DP. I don’t know how to prove this mathematically, but I tried creating a program which counts the number of possible remainder and indeed this is true for powers of two. Notice that the decomposition would produce a tree with four childs, and so I used 4-ary heap-like indexing. https://www.codechef.com/viewsolution/10762921

You can decompose a polynomial into four polynomials in this way:

A(x) = A_{4k}(x^4) + x A_{4k+1}(x^4) + x^2 A_{4k+2}(x^4) + x^3 A_{4k + 3}(x^4)

IN the formula above, A_{4k}(x^4) are the coefficients that are multiples of 4, A_{4k+1}(x^4) are the coeffiients that are multiples of 4 but +1.

We can also decompose a polynomial into 8 polynomials in terms of x^8. I think I would have gotten faster running time if I used higher power such as 8 because the number of remainders reduce even faster.

I hope the logic is sufficiently understandable, let me know if there is something unclear with my explanation.

Same approach is described in Cormen Book too. But it seems hard to code.

No prime, every x up to the modulo.

A lot of the links here have broken, so not many of these answers are useful. mightymercado’s solution is interesting, but it does not use NTT, so I would like to post my O(n \log n) solution which uses a modified version of NTT.

See, NTT usually works by evaluating a polynomial at the 2^k-roots of unity in \Bbb{Z}_n. Since the modulo is 786433=3\cdot 2^{18}+1, the most obvious thing to do would be to use NTT to evaluate the polynomial at all of the 2^{18}-roots of unity in \Bbb{Z}_{786433}. However, this does not quite work because we need to evaluate the polynomial for ALL integers in \Bbb{Z}_{786433}, not just the 2^{18}-roots of unity.

Thus, we need two more critical observations: First, any non-zero integer in \Bbb{Z}_{786433} is a 3\cdot 2^{18}-root of unity because \phi(786433)=3\cdot 2^{18} and a^{\phi(n)} \equiv 1 \pmod n by Euler’s totient theorem. Second, NTT can easily be modified to evaluate a polynomial at all of the 3\cdot 2^{18}-roots of unity.

See, NTT is a divide-and-conquer process, meaning that first, it will evaluate a bunch of 2-coefficient polynomials using the 2^1-roots of unity, then evaluate a bunch of 4-coefficient polynomials using the 2^2-roots of unity, etc. We can modify this algorithm so that, before using the 2^1-roots of unity, we first evaluate a bunch of 3-coefficient polynomials using the 3-roots of unity. Then, using the results from this process, we can evaluate a bunch of 6-coefficient polynomials using the 3\cdot 2^1-roots of unity, then evaluate a bunch of 12-coefficient polynomials using the 3\cdot 2^2-roots of unity, etc. We continue doing this until we have evaluated the whole polynomial with 3\cdot 2^{18} coefficients using the 3\cdot 2^{18}-roots of unity. (Note that this will require padding the polynomial with 0s to make it have exactly 3\cdot 2^{18} coefficients.)

Once we are done with NTT, we have evaluated the polynomial at all of the non-zero elements of \Bbb{Z}_{786433}, so all we need to do now is read in the queries and spit out the answers.

For more detail about this algorithm, you can read the solution I linked at the beginning of the answer. My solution finished every sub-task in 0.2 seconds or less, which is ten times as fast as mightymercado’s solution, so I think this is a substantial improvement.