1) Can you estimate the time complexity of the official solution ? (per test case, as a function of ~~n)
~~n)
**Editorialist reply: Done. Check the editorial.**
I am asking because at some point I had an **O(n^3)** per test case solution and was too slow (I mean O(n^3) 128-bit operations). I had to come up with extra ideas in order to cut the time complexity down to **O(n^2 * log(n))** per test case (note that I don't mean base 2 logarithm). Of course, the time complexity per test case is not all that matters - preprocessing should be counted, too (my solution had *O(n^3)* preprocessing).
2) During the contest (while searching the web in the hope of finding useful mathematics to use for solving the problem) I did come across the e-olimp problem you mentioned (with n <= 40). I didn't know there was any editorial available and I noticed that Anton was the only one with an accepted solution at that problem on e-olimp :) Still.. I though that Russian or Ukrainian competitors would have an advantage because of that problem.. it seems that I was wrong.
3) During the contest I did come across Granville's paper (among many other papers). But I was looking for simpler equations/formulas so I automatically skipped Theorem 2, which looked rather complicated to me (so I thought it wouldn't be useful) :) In the end, I used only elementary mathematics in my solution (and I had to prove everything myself from scratch, except for how to compute the modular multiplicative inverse of an odd number modulo 2^n). I think @scli used the property from Granville's paper (I browsed his solution earlier and I noticed a i * i - j * j somewhere : I didn't understand it then, but now it's starting to make sense)