PROBLEM LINKDIFFICULTYHARD PREREQUISITESDynamic Programming, Discrete Fourier Transform, Fast Fourier Transform PROBLEMYou are given a list of integers between 1 and 30000, inclusive. You have to find triplets i < j < k in A[], such that A[i], A[j] and A[k] are in Arithmetic Progression. QUICK EXPLANATIONThe trivial algorithm to solve this problem is O(N^{3}). This is of course too slow for N = 100,000. Let us consider the O(N^{2}) algorithm below to solve this problem. It will also be slow, but it gives us ideas that can lead to a solution. prev[1 ... 30000] = {0} // we will maintain frequencies here result = 0 for j = 1 to N // fix j for k = j+1 to N // fix k Ai = 2*A[j]  A[k] // the value at i result += prev[Ai] // add the number of choices for i prev[ A[j] ]++ Of course this is too slow. EXPLANATIONLet us consider an even slower algorithm. The reason to consider this algorithm, is because it introduces a very powerful idea to solve the problem. Once we understand how that works, we will talk about how we can solve the problem. prev[1 ... 30000] = {0} // we will maintain frequencies here next[1 ... 30000] = {0} // we will mataiain frequencies here too! This time, // we maintain frequencies of occurance 'after' j for v = 1 to N next[ A[v] ]++ for j = 1 to N next[ A[j] ] cnt[1 ... 60000] = {0} for Ai = 1 to 30000 for Ak = 1 to 30000 cnt[Ai + Ak] += prev[Ai] * next[Ak] result += cnt[ 2*A[j] ] prev[ A[j] ]++ Notice how we maintain cnt. This is actually equivalent to multiplication of two polynomials! Suppose we had two polynomials
The product of these two polynomials R, can be calculated as follows for i = 1 to 30000 for j = 1 to 30000 R[i+j] = P[i] * Q[j] We can optimize the product of the two polynomials by using Discrete Fourier Transforms! fP = DFT(P) fQ = DFT(Q) fR = {0} for i = 1 to 2^{16} fR[i] = fP[i] * fQ[i] R = iDFT(fR) Where, fR is the Discrete Fourier Transform of R, and iDFT is the inverse Discrete Fourier Transform method. Discrete Fourier Transforms, and inverse Discrete Fourier Transforms of an array of size N can be calculated in O(N log N) time using the Fast Fourier Transform method! The link to the same has been provided in the PREREQUISITES section above. Hence we can modify our O(N*30000^{2}) algorithm into a O(N*30000*16) algorithm. This, is still too slow to solve the problem. Now we introduce the idea that solves the problem. Optimization using BlocksSuppose we split the entire list into K blocks each of size N/K.
We have three cases to consider while choosing triplets.
This the overall complexity of the algorithm would be O(K*30000*16 + N*N/K) You can experiment and choose an optimal K. The tester's solution chooses K = 30, the setter's solution chooses K = 40. This will fit within the time limit. SETTERS SOLUTIONCan be found here. TESTERS SOLUTIONCan be found here.
This question is marked "community wiki".
asked 12 Nov '12, 21:00

Without the condition i < j < k the problem can be solved in O(N + V * log V) time, where V = 30000, using just one multiplication of polynomials. We simply should take the square of the polynomial with coefficients equal to cnt[x] which is the frequency of number x in the array. It was clear for me immediately after reading the problem since this version has appeared several times on different online judges. But condition i < j < k stops me for a while. I was still believing that using FFT is possible (and it seems true as editorial shows :)). But after several days of useless tries to apply FFT here I just decided to submit some optimized version of naive O(N * V) approach that is obvious modification of O(N * V * V) draft solution from the editorial: prev[1 ... 30000] = {0} next[1 ... 30000] = {0} for v = 1 to N next[ A[v] ]++ But I was absolutely sure that this should get TLE. (Now I see that even this solution can pass: proof). IMHO the main bottleneck is the multiplication operation in the loop. So I decided instead of calculating cnt each time, update the whole array cnt[] using only additions and subtractions. Indeed, when we made next[A[j]] we should decrease cnt[(x+A[j])/2] by prev[x] for each x of the same parity as A[j] (follows from the line cnt[Ai + Ak] += prev[Ai] * next[Ak] in the editorial). The similar things happens when prev[A[j]]++ happens. At first sight, I thought that values cnt[] do not fit in 32bit integer type and use In fact, I see now (by examining all 37 solutions) that almost all contestants (UPD 27 out of 37 with one having FFT in code but not using it :)) solved the problem without FFT and what is very curious that almost all top solutions do not use FFT. Well, my solution appears to be the slowest among all C++ solutions :) answered 12 Nov '12, 23:26
"@ kunalgaurav18 Because of the condition i < j < k. Well, for me this condition ruins very beautiful solution. Currently to handle this condition I have very bad solution that pass the TL which were very surprisingly for me." If it was not for your comment, I would never have tried my slow O(NM) at the first place. Thanks :)
(12 Nov '12, 23:38)
1
Hm... I knew that I shouldn't write this comment. Admins should bun me for ages because of it :)
(13 Nov '12, 00:01)
@anton_lunyov You are telling that without i < j < k condition, you can find all triplets easily? First of all I have no idea how. And if I have such count C, isn't answer to this question C/3! ?
(13 Nov '12, 01:59)
1
@betlista There is no connection between this two counts at all.
For this test As for the part of how to find count efficiently, I wrote this explicitly in my answer. After you find the square of the mentioned polynomial you can find the answer in O(V) time.
(13 Nov '12, 02:39)

This solution is probably one of the best solutions I've ever read... Thank you very much for the insights and tricks for our bag that every single long CC contest provides us... :D Hopefully, not so late, I might be on the "other side" as well!!! Fantastic community and awesome contests!! PS: thanks to NOV12 I learned some new data structures that allowed me to get an AC solution during a live contest and even though I only solved 2 problems I can't wait for the next contests Thank you all, Bruno answered 13 Nov '12, 04:49

But many bruteforce solutions pass this problem, for a simple operation like iterate through two array and found their dot product, if we use 16 point each pointer to a position 16*x+i, and let them do their work in a single loop. then the modern cpu can run it VERY fast. FFT's constant is very huge so it even run slower then that kind of bruteforce :(. answered 13 Nov '12, 12:54
