PROBLEM LINK:Author: Misha Chorniy DIFFICULTY:Hard PREREQUISITES:Fast Fourier Transform (FFT) PROBLEM:Given $N$ distinct points in 3 dimensional space and four integers $A$, $B$, $C$ and $D$, you are required to compute the following sum: $$\sum_{i \neq j} \frac{\mathopenA(X_{i}  X_{j}) + B(Y_{i}  Y_{j}) + C(Z_{i}  Z_{j}) + D\mathclose} {N(N1)\sqrt{(X_{i}  X_{j})^4 + (Y_{i}  Y_{j})^4 + (Z_{i}  Z_{j})^4}} $$ Also, you are given $Q$ different values $(A, B, C, D)$ and are required to compute the above sum $Q$ times. EXPLANATION:
There are 2 subtasks for this problem. For the first subtask, it is given that $Q\times N^2 ≤ 77777777$. This hints at using an $\mathcal{O}(N^2)$ solution per query. It is easy to see that the required sum contains exactly $N\times (N1)$ terms, and we can just simulate the calculation. Here is a code snippet for the same. Pseudo Code: func naive(*X, A, B, C, D) { ans = 0; for (int i = 1; i <= N; ++i) { for(int j = 1; j <= N; ++j) { if (i == j) continue; dx = X[i]  X[j]; dy = Y[i]  Y[j]; dz = Z[i]  Z[j]; num = abs(A * dx + B * dy + C * dz + D); den = sqrt(dx * dx * dx * dx + dy * dy * dy * dy + dz * dz * dz * dz); } } return ans / N / (N1); }
For the second subtask $N ≤ 777777$. Clearly, an $\mathcal{O}(N^2)$ solution per query will time out. Let's take a closer look at the other constraints. It is given that $1 ≤ X_{i}, Y_{i}, Z_{i} ≤ 77 $. This implies that $(X_{i}  X_{j})$ can only lie in the range $[76, 76]$, and can only take $153$ values. The same is true for $(Y_{i}  Y_{j})$ and $(Z_{i}  Z_{j})$. Hence, the triplet $(X_{i}  X_{j}, Y_{i}  Y_{j}, Z_{i}  Z_{j})$ can take at most $153^3 \approx 3.5 \times 10^6$ different values. If we can efficiently count the number of times a given triplet $(X_{i}  X_{j}, Y_{i}  Y_{j}, Z_{i}  Z_{j})$ occurs, then the problem can be solved easily using the following code snippet. Pseudo Code: V = 77, ans = 0; for (dx = (V1) to (V1)) { for (dy = (V1) to (V1)) { for (dz = (V1) to (V1)) { if (dx == 0 and dy == 0 and dz == 0) continue; num = cnt[(dx, dy, dz)] * abs(A * dx + B * dy + C * dz + D); den = sqrt(dx * dx * dx * dx + dy * dy * dy * dy + dz * dz * dz * dz); ans += num / den; } } } ans = ans / N / (N1);If we can precompute and store the values of cnt[(dx, dy, dz)] before hand, then the above code snippet runs in $\mathcal{O}(V^3)$ time per query, where $V = 77$. Hence, the overall time complexity for $Q$ queries is $\mathcal{O}(Q \times V^3)$, which will run in time.
Now, the only subproblem remaining is to count the number of times each triplet $(X_{i}  X_{j}, Y_{i}  Y_{j}, Z_{i}  Z_{j})$ occurs. To solve this problem, let us define a few quantities:
$$(X_{i}, Y_{i}, Z_{i}) => A[X_{i} + 2XY_{i} + 4XYZ_{i}] \mathrel{+}= 1$$ $$(X_{i}, Y_{i}, Z_{i}) => B[(L1) (X_{i} + 2XY_{i} + 4XYZ_{i})] \mathrel{+}= 1$$ Let us define a polynomial $C$ of degree $(2L1)$ as the product of the 2 polynomials $A$ and $B$. Let us consider any coefficient of $C$. We can see that it includes some $(X_{i} + 2XY_{i} + 4XYZ_{i})$ from $A$ and some $((L1)  (X_{j} + 2XY_{j} + 4XYZ_{j}))$ from $B$. In other words, for some index $k$ in $C$, we have: $$\begin{align}k &= X_{i} + 2XY_{i} + 4XYZ_{i} + (L1)  (X_{j} + 2XY_{j} + 4XYZ_{j}) \\ & = (T + M) + (X_{i}  X_{j}) + 2X(Y_{i}  Y{j}) + 4XY(Z_{i}  Z_{j}) \\ & = T + (X + (X_{i}  X_{j})) + 2X(Y + (Y_{i}  Y_{j})) + 4XY(Z + (Z_{i}  Z_{j})) \end{align} $$ We can now extract the values of dx, dy and dz using the following code. Pseudo Code: for (k = 0 to 2*L1) { if (C[k] == 0) continue; dx = (k  T) % (2*X)  X; dy = (k  T) % (4*X*Y) / (2*X)  Y; dz = (k  T) / (4*X*Y)  Z; if (dx == 0 and dy == 0 and dz == 0) continue; cnt[(dx, dy, dz)] = C[k]; The above code runs in $\mathcal{O}(L)$. Now, the only thing remaining is the multiplication of the two polynomials $A$ and $B$ to get $C$. This is a well known problem and can be solved by using Fast Fourier Transform (FFT). You can read more about the method here. The complexity for this step is $\mathcal{O}(L \times \log{L})$. From the above notations, $L \approx V^3$ where $V = max(X_{i}, Y_{i}, Z_{i})$. We can write the run time complexity of this step as $\mathcal{O}(V^3 + V^3 \times \log{V^3})$. Hence, the overall time complexity of this problem is $\mathcal{O}(Q \times V^3 + V^3 + V^3 \times \log{V^3})$, where $V = max(X_{i}, Y_{i}, Z_{i})$.
We have already designed a solution which should theoretically pass all the test cases within the given time limit. But the constants are very high, and the time limit is set such that solutions without further optimizations will fail. For multiplication of 2 polynomials, normally we would need to use FFT 3 times:
We have already seen above that the polynomial $C$ is of degree $(2L1)$. Also, from the definition of $A$ and $B$, it is trivial to note that $A_{i} = B_{(L1)i}$. Let us take a closer look at $B$. $$\begin{align}B(x) &= B_{0} + B_{1}x + B_{2}x^2 + ... B_{L1}x^{L1} \\ & = A_{L1} + A_{L2}x + A_{L3}x^2 + ... + A_{0}x^{L1} \\ & = x^{L1} (A_{L1}x^{(L1)} + A_{L2}x^{(L2)} + ... + A_{0}) \\ & = x^{L1} A(x^{1}) \\ & = x^{L1} A(1/x) \end{align} $$ Let $T[i]$ denote the value of $A(x)$ evaluated at $x = \omega_{2L}^{i}$. Here, $\omega_{N}$ denotes the $N^{\text{th}}$ complex root of unity. The first FFT calculates the values of $T[i]$, for $(0 ≤ i ≤ 2L1)$. Now, we can use the fact that $B_{x} = x^{L1} A(1/x)$. For any $\omega_{2L}^{i}$, we have $B(\omega_{2L}^{i}) = {\omega_{2L}^{(L1)i}} A(\omega_{2L}^{i})$ which can be written as ${\omega_{2L}^{((L1)\times i) \bmod 2L}} A(\omega_{2L}^{2Li})$, since $\omega_{2L}^{2L} = 1$. Therefore, we have $B(\omega_{2L}^{i}) = {\omega_{2L}^{((L1)\times i) \bmod 2L}} T[2Li]$. Hence, the DFT of B can be computed in $\mathcal{O}(L)$ and there is no need to perform the second FFT. The final FFT to calculate $C$ is however needed. Hence, we can compute the result with only 2 FFTs instead of 3. The tester's solution uses this method to compute the solution. You can take a look at it for more details.
There is another method to reduce the number of FFTs from 3 to 2. Let us define 2 new polynomials as follows:
Here, $i$ denotes $\sqrt{1}$. Let $F_{p}[k]$ denote the value of $P(x)$ evaluated at $x = \omega_{2L}^{k}$ and let $F_{q}[k]$ denote the value of $Q(x)$ evaluated at $x = \omega_{2L}^{k}$. Again, $\omega_{N}$ denotes the $N^{\text{th}}$ complex root of unity. Let's try to expand both these quantities. $$\begin{align}
F_{p}[k] &= A(\omega_{2L}^{k}) + i B(\omega_{2L}^{k}) \\
& = \sum_{j=0}^{2L1} A_{j} \omega_{2L}^{jk} + i B_{j} \omega_{2L}^{jk} \\
& = \sum_{j=0}^{2L1} (A_{j} + i B_{j}) \left(\cos \left(\frac{2 \pi jk}{2L}\right) + i \sin \left(\frac{2 \pi jk}{2L}\right)\right)
\end{align}
$$
Hence, we make one FFT call to compute the DFT of $P$. We have seen that the DFT of $Q$ can be computed in $\mathcal{O}(L)$. Using this information, it is trivial to calculate the DFT of $A$ and $B$.
A second FFT to calculate $C$ is again needed. Hence, we can compute the result with only 2 FFTs instead of 3 with this method too. The beauty of this method is that it is generic and can be used for the multiplication of any 2 polynomials. :) The setter's solution uses this method to compute the solution. You can take a look at it for more details. P.S. Some submissions using 3 FFTs managed to pass, in spite of the stringent time limit. :) COMPLEXITY:$\mathcal{O}(Q \times V^3 + V^3 \times \log{V^3})$, where $V = max(X_{i}, Y_{i}, Z_{i})$. SAMPLE SOLUTIONS:
This question is marked "community wiki".
asked 05 Sep '15, 17:06

Answer is hidden as author is suspended. Click here to view.
answered 14 Sep '15, 15:30
Eliminate the calls to pow. Either use simple arithmetic as in the solution given above (will make your code somewhat faster). Or even better use a lookup table for the 76 necessary values and precompute it outside the loop.
(16 Sep '15, 03:37)

Function pow(something, 4) works very long. If you rewrite next line in your code, you'll get 15 pts. sqrt(pow(x_d,4)+pow(y_d,4)+pow(z_d,4)); > sqrt(x_d * x_d * x_d * x_d + y_d * y_d * y_d * y_d + z_d * z_d * z_d * z_d); Let's solve problem for 1D, given N points with coordinates Xi(Xi >= 0 && Xi <= 1e5), how to know how many times vector (Xi  Xj, 0, 0) will be counted between all pairs (i, j), let's construct two polynoms A and B: MAX = max(X[i]) A[X[i]] = 1 B[MAX  X[i]] = 1 If we multiply these polynoms: C[i + j] = Sum (A[i] * B[j])  Simple multiply (but with FFT not simple) C[X[i] + (MAX  X[j])] = Sum (A[X[i]] * B[X[j]]) C[MAX + (X[i]  X[j])] = Sum (A[x[i]] * B[X[j]]) For 3D: Let XMAX = max(X[i]), YMAX = max(Y[i]), ZMAX = max(Z[i]) So construct A and B: X[i]  X[j] < XMAX Y[i]  Y[j] < YMAX Z[i]  Z[j]] < ZMAX A[X[i] + 2 * XMAX * Y[i] + (2 * XMAX) * (2 * YMAX) * Z[i]] = 1 B[(XMAX  X[i]) + 2 * XMAX * (YMAX  Y[i]) + (2 * XMAX) * (2 * YMAX) * (ZMAX  Z[i])] = 1 C[i + j] = Sum(A[i] * B[j]) C[(X[i] + 2 * XMAX * Y[i] + (2 * XMAX) * (2 * YMAX) * Z[i]) + ((XMAX  X[j]) + 2 * XMAX * (YMAX  Y[j]) + (2 * XMAX) * (2 * YMAX) * (ZMAX  Z[j]))] = C[XMAX + (X[i]  X[j]) + 2 * XMAX * (YMAX + (Y[i]  Y[j])) + 4 * XMAX * YMAX * (ZMAX + (Z[i]  Z[j])] = A[(X[i] + 2 * XMAX * Y[i] + (2 * XMAX) * (2 * YMAX) * Z[i])] * B[((XMAX  X[j]) + 2 * XMAX * (YMAX  Y[j]) + (2 * XMAX) * (2 * YMAX) * (ZMAX  Z[j]))] XMAX + (X[i]  X[j]) > 0 && XMAX + (X[i]  X[j]) < 2 * XMAX YMAX + (Y[i]  Y[j]) > 0 && YMAX + (Y[i]  Y[j]) < 2 * YMAX ZMAX + (Z[i]  Z[j]) > 0 && ZMAX + (Z[i]  Z[j]) < 2 * ZMAX Optimized solutions with 3 FFT/NTT could get AC. No one has written solution with 2 FFT:) But you can see that optimized FFT works much faster that NTT. answered 15 Sep '15, 00:57
@mgch Thank you for the explanation. I understood the part with one dimension. But would it be wrong to choose A[xi + yi + zi] and B[MaxX  xi + MaxY  yi + MaxZ  zi] for 3d. What is the significance of 2MaxX and 2MaxX2MaxY here? It has probably got to do with the total possible combinations but can you elaborate a little?
(16 Sep '15, 00:16)
@eknoor292: It would be wrong because then all the triples (xi, yi, zi) with the same sum would be mapped to the same index in the array. E.g. (1,2,3), (1,1,4), (3,1,2) would all be mapped to 6 because they all have the same sum. The point of the mapping described in the editorial is to map each different triple (xi, yi, zi) to a different index in the array.
(16 Sep '15, 00:27)
@mugurelionut : Ah! that makes sense. Thanks a lot.
(16 Sep '15, 01:59)

Hmm, my solution passed by precomputing the values
Here is the solution: link answered 14 Sep '15, 16:54

How to reach these equations? Why have they been chosen the way they are right now. X=max(Xi)+1,Y=max(Yi)+1,Z=max(Zi)+1. M=X+2XY+4XYZ. T=(L−1)−M. As usual like previous editorials this one too skipped over the parts that needs to be explained well and over detailed other parts. answered 14 Sep '15, 22:43

@sarvagya3943, I also did the same think earlier. Here is a link to the solution https://www.codechef.com/viewsolution/8000952. It is assumed that 10^8 operations are performed per second by the online judge. Going the contraints, out solution should take atleast 7.78 (QNN <= 77777777) seconds which will definitely time out. One way to reduce number of calculations by NN is by omitting the step of dividing the answer by (N(N1)) at each step. It is a constant for given test file and can be stored at some other place and divided at once for the final answer as suggested in the editorial above. Some more optimizations can be seen in the solution link https://www.codechef.com/viewsolution/8085593 (using division by n*(n1) at each step) and https://www.codechef.com/viewsolution/8087235 (with the above said trick). Hope it helps. :) answered 14 Sep '15, 23:22
