 MGCH3D - Editorial

#1

Author: Misha Chorniy
Tester: Kevin Atienza
Editorialists: Suhash Venkatesh and Pushkar Mishra

Hard

PREREQUISITES:

Fast Fourier Transform (FFT)

PROBLEM:

Given N distinct points in 3 dimensional space and four integers A, B, C and D, you are required to compute the following sum:

\sum_{i eq j} \frac{\mathopen|A(X_{i} - X_{j}) + B(Y_{i} - Y_{j}) + C(Z_{i} - Z_{j}) + D\mathclose|} {N(N-1)\sqrt{(X_{i} - X_{j})^4 + (Y_{i} - Y_{j})^4 + (Z_{i} - Z_{j})^4}}

Also, you are given Q different values (A, B, C, D) and are required to compute the above sum Q times.

EXPLANATION:

There are 2 subtasks for this problem. For the first subtask, it is given that Q imes N^2 ≤ 77777777. This hints at using an \mathcal{O}(N^2) solution per query. It is easy to see that the required sum contains exactly N imes (N-1) terms, and we can just simulate the calculation. Here is a code snippet for the same.

Pseudo Code:

func naive(*X, A, B, C, D) {
ans = 0;
for (int i = 1; i <= N; ++i) {
for(int j = 1; j <= N; ++j) {
if (i == j) continue;
dx = X* - X[j];
dy = Y* - Y[j];
dz = Z* - Z[j];
num = abs(A \* dx + B \* dy + C \* dz + D);
den = sqrt(dx \* dx \* dx \* dx + dy \* dy \* dy \* dy + dz \* dz \* dz \* dz);
}
}
return ans / N / (N-1);
}


For the second subtask N ≤ 777777. Clearly, an \mathcal{O}(N^2) solution per query will time out. Let’s take a closer look at the other constraints. It is given that 1 ≤ X_{i}, Y_{i}, Z_{i} ≤ 77 . This implies that (X_{i} - X_{j}) can only lie in the range [-76, 76], and can only take 153 values. The same is true for (Y_{i} - Y_{j}) and (Z_{i} - Z_{j}). Hence, the triplet (X_{i} - X_{j}, Y_{i} - Y_{j}, Z_{i} - Z_{j}) can take at most 153^3 \approx 3.5 imes 10^6 different values. If we can efficiently count the number of times a given triplet (X_{i} - X_{j}, Y_{i} - Y_{j}, Z_{i} - Z_{j}) occurs, then the problem can be solved easily using the following code snippet.

Pseudo Code:

V = 77, ans = 0;
for (dx = -(V-1) to (V-1)) {
for (dy = -(V-1) to (V-1)) {
for (dz = -(V-1) to (V-1)) {
if (dx == 0 and dy == 0 and dz == 0)    continue;
num = cnt[(dx, dy, dz)] * abs(A \* dx + B \* dy + C \* dz + D);
den = sqrt(dx \* dx \* dx \* dx + dy \* dy \* dy \* dy + dz \* dz \* dz \* dz);
ans += num / den;
}
}
}
ans = ans / N / (N-1);


If we can pre-compute and store the values of cnt[(dx, dy, dz)] before hand, then the above code snippet runs in \mathcal{O}(V^3) time per query, where V = 77. Hence, the overall time complexity for Q queries is \mathcal{O}(Q imes V^3), which will run in time.

Now, the only sub-problem remaining is to count the number of times each triplet (X_{i} - X_{j}, Y_{i} - Y_{j}, Z_{i} - Z_{j}) occurs. To solve this problem, let us define a few quantities:

• Let X = max(X_{i}) + 1, Y = max(Y_{i}) + 1, Z = max(Z_{i}) + 1.
• Let M = X + 2XY + 4XYZ.
• Let L be the smallest power of 2, such that L > M. For this problem, we can note that L = 2^{21} works for all cases.
• Let T = (L-1) - M.
• Let us define 2 polynomials A and B, each of degree L-1. For every input point (X_{i}, Y_{i}, Z_{i}), we update the coefficients of the 2 polynomials as follows (Note that all coefficients are initially 0):
(X_{i}, Y_{i}, Z_{i}) => A[X_{i} + 2XY_{i} + 4XYZ_{i}] \mathrel{+}= 1
(X_{i}, Y_{i}, Z_{i}) => B[(L-1) -(X_{i} + 2XY_{i} + 4XYZ_{i})] \mathrel{+}= 1

Let us define a polynomial C of degree (2L-1) as the product of the 2 polynomials A and B. Let us consider any coefficient of C. We can see that it includes some (X_{i} + 2XY_{i} + 4XYZ_{i}) from A and some ((L-1) - (X_{j} + 2XY_{j} + 4XYZ_{j})) from B. In other words, for some index k in C, we have:

\begin{align}k &= X_{i} + 2XY_{i} + 4XYZ_{i} + (L-1) - (X_{j} + 2XY_{j} + 4XYZ_{j}) \\ & = (T + M) + (X_{i} - X_{j}) + 2X(Y_{i} - Y{j}) + 4XY(Z_{i} - Z_{j}) \\ & = T + (X + (X_{i} - X_{j})) + 2X(Y + (Y_{i} - Y_{j})) + 4XY(Z + (Z_{i} - Z_{j})) \end{align}

We can now extract the values of dx, dy and dz using the following code.

Pseudo Code:

for (k = 0 to 2\*L-1) {
if (C[k] == 0)    continue;
dx = (k - T) % (2\*X) - X;
dy = (k - T) % (4\*X\*Y) / (2\*X) - Y;
dz = (k - T) / (4\*X\*Y) - Z;
if (dx == 0 and dy == 0 and dz == 0)    continue;
cnt[(dx, dy, dz)] = C[k];
}


The above code runs in \mathcal{O}(L). Now, the only thing remaining is the multiplication of the two polynomials A and B to get C. This is a well known problem and can be solved by using Fast Fourier Transform (FFT). You can read more about the method here. The complexity for this step is \mathcal{O}(L imes \log{L}). From the above notations, L \approx V^3 where V = max(X_{i}, Y_{i}, Z_{i}). We can write the run time complexity of this step as \mathcal{O}(V^3 + V^3 imes \log{V^3}). Hence, the overall time complexity of this problem is \mathcal{O}(Q imes V^3 + V^3 + V^3 imes \log{V^3}), where V = max(X_{i}, Y_{i}, Z_{i}).

Further Optimization

We have already designed a solution which should theoretically pass all the test cases within the given time limit. But the constants are very high, and the time limit is set such that solutions without further optimizations will fail.

For multiplication of 2 polynomials, normally we would need to use FFT 3 times:

1. We perform a FFT to calculate the DFT of the polynomial A(x).
2. We perform another FFT to calculate the DFT of the polynomial B(x).
3. Finally, we need to perform an inverse FFT to calculate the product C(x).

We have already seen above that the polynomial C is of degree (2L-1). Also, from the definition of A and B, it is trivial to note that A_{i} = B_{(L-1)-i}. Let us take a closer look at B.

\begin{align}B(x) &= B_{0} + B_{1}x + B_{2}x^2 + ... B_{L-1}x^{L-1} \\ & = A_{L-1} + A_{L-2}x + A_{L-3}x^2 + ... + A_{0}x^{L-1} \\ & = x^{L-1} (A_{L-1}x^{-(L-1)} + A_{L-2}x^{-(L-2)} + ... + A_{0}) \\ & = x^{L-1} A(x^{-1}) \\ & = x^{L-1} A(1/x) \end{align}

Let T* denote the value of A(x) evaluated at x = \omega_{2L}^{i}. Here, \omega_{N} denotes the N^{ ext{th}} complex root of unity. The first FFT calculates the values of T*, for (0 ≤ i ≤ 2L-1). Now, we can use the fact that B_{x} = x^{L-1} A(1/x).

For any \omega_{2L}^{i}, we have B(\omega_{2L}^{i}) = {\omega_{2L}^{(L-1)i}} A(\omega_{2L}^{-i}) which can be written as {\omega_{2L}^{((L-1) imes i) \bmod 2L}} A(\omega_{2L}^{2L-i}), since \omega_{2L}^{2L} = 1.

Therefore, we have B(\omega_{2L}^{i}) = {\omega_{2L}^{((L-1) imes i) \bmod 2L}} T[2L-i]. Hence, the DFT of B can be computed in \mathcal{O}(L) and there is no need to perform the second FFT. The final FFT to calculate C is however needed. Hence, we can compute the result with only 2 FFTs instead of 3. The tester’s solution uses this method to compute the solution. You can take a look at it for more details.

**Alternative Solution**

There is another method to reduce the number of FFTs from 3 to 2. Let us define 2 new polynomials as follows:

• P(x) = A(x) + i B(x)
• Q(x) = A(x) - i B(x)

Here, i denotes \sqrt{-1}. Let F_{p}[k] denote the value of P(x) evaluated at x = \omega_{2L}^{k} and let F_{q}[k] denote the value of Q(x) evaluated at x = \omega_{2L}^{k}. Again, \omega_{N} denotes the N^{ ext{th}} complex root of unity. Let’s try to expand both these quantities.

\begin{align} F_{p}[k] &= A(\omega_{2L}^{k}) + i B(\omega_{2L}^{k}) \\ & = \sum_{j=0}^{2L-1} A_{j} \omega_{2L}^{jk} + i B_{j} \omega_{2L}^{jk} \\ & = \sum_{j=0}^{2L-1} (A_{j} + i B_{j}) \left(\cos \left(\frac{2 \pi jk}{2L}\right) + i \sin \left(\frac{2 \pi jk}{2L}\right)\right) \end{align}

\begin{align} F_{q}[k] &= A(\omega_{2L}^{k}) - i B(\omega_{2L}^{k}) \\ & = \sum_{j=0}^{2L-1} A_{j} \omega_{2L}^{jk} - i B_{j} \omega_{2L}^{jk} \\ & = \sum_{j=0}^{2L-1} (A_{j} - i B_{j}) \left(\cos \left(\frac{2 \pi jk}{2L}\right) + i \sin \left(\frac{2 \pi jk}{2L}\right)\right) \\ & = \sum_{j=0}^{2L-1} \left(A_{j} \cos \left(\frac{2 \pi jk}{2L}\right) + B_{j} \sin \left(\frac{2 \pi jk}{2L}\right)\right) + i \left(A_{j} \sin \left(\frac{2 \pi jk}{2L}\right) - B_{j} \cos \left(\frac{2 \pi jk}{2L}\right)\right) \\ & = ext{conj} \left( \sum_{j=0}^{2L-1} \left(A_{j} \cos \left(\frac{2 \pi jk}{2L}\right) + B_{j} \sin \left(\frac{2 \pi jk}{2L}\right)\right) - i \left(A_{j} \sin \left(\frac{2 \pi jk}{2L}\right) - B_{j} \cos \left(\frac{2 \pi jk}{2L}\right)\right) \right) \\ & = ext{conj} \left( \sum_{j=0}^{2L-1} \left(A_{j} \cos \left(\frac{-2 \pi jk}{2L}\right) - B_{j} \sin \left(\frac{-2 \pi jk}{2L}\right)\right) + i \left(A_{j} \sin \left(\frac{-2 \pi jk}{2L}\right) + B_{j} \cos \left(\frac{-2 \pi jk}{2L}\right)\right) \right) \\ & = ext{conj} \left( \sum_{j=0}^{2L-1} (A_{j} + i B_{j}) \left(\cos \left(\frac{-2 \pi jk}{2L}\right) + i \sin \left(\frac{-2 \pi jk}{2L}\right)\right)\right) \\ & = ext{conj} \left( \sum_{j=0}^{2L-1} (A_{j} + i B_{j}) \omega_{2L}^{-jk} \right) \\ & = ext{conj} \left( \sum_{j=0}^{2L-1} (A_{j} + i B_{j}) \omega_{2L}^{(2L-k)j} \right) \\ & = ext{conj} (F_{p}[2L-k]) \end{align}

Hence, we make one FFT call to compute the DFT of P. We have seen that the DFT of Q can be computed in \mathcal{O}(L). Using this information, it is trivial to calculate the DFT of A and B.

• FFT(A[k]) = (F_{p}[k] + F_{q}[k]) / 2
• FFT(B[k]) = i (F_{q}[k] - F_{p}[k]) / 2 (Why?)

A second FFT to calculate C is again needed. Hence, we can compute the result with only 2 FFTs instead of 3 with this method too. The beauty of this method is that it is generic and can be used for the multiplication of any 2 polynomials. The setter’s solution uses this method to compute the solution. You can take a look at it for more details.

P.S. Some submissions using 3 FFTs managed to pass, in spite of the stringent time limit. COMPLEXITY:

\mathcal{O}(Q imes V^3 + V^3 imes \log{V^3}), where V = max(X_{i}, Y_{i}, Z_{i}).

SAMPLE SOLUTIONS:

#2

I did the same thing for subtask 1 but got TLE for the second case. https://www.codechef.com/viewsolution/8088795
Can someone tell me why !

#3

Hmm, my solution passed by precomputing the values

cnt[(dx, dy, dz)] / sqrt(dx^4 + dy^4 + dz^4) for each dx, dy, dz and using 3 NTT’s with N = 2^22. However, I further optimized the calculations so that I would elliminate multiplications with 1 and 0 and other irrelevant calculations (I still think this part is not needed though). It passed with ~6s on the worst case.

Here is the solution:

#4

How to reach these equations? Why have they been chosen the way they are right now.

X=max(Xi)+1,Y=max(Yi)+1,Z=max(Zi)+1.

M=X+2XY+4XYZ.

T=(L−1)−M.

As usual like previous editorials this one too skipped over the parts that needs to be explained well and over detailed other parts.

#5

@sarvagya3943, I also did the same think earlier. Here is a link to the solution https://www.codechef.com/viewsolution/8000952.

It is assumed that 10^8 operations are performed per second by the online judge. Going the contraints, out solution should take atleast 7.78 (QNN <= 77777777) seconds which will definitely time out. One way to reduce number of calculations by NN is by omitting the step of dividing the answer by (N(N-1)) at each step. It is a constant for given test file and can be stored at some other place and divided at once for the final answer as suggested in the editorial above.

Some more optimizations can be seen in the solution link https://www.codechef.com/viewsolution/8085593 (using division by n*(n-1) at each step) and https://www.codechef.com/viewsolution/8087235 (with the above said trick).

Hope it helps. #6

@eknoor292

Because you want to pack 3 values into a single value. If you want to do (x,y,z)–>w then the obvious way is w=x+(xmax+1)y+(xmax+1)(ymax+1)*z. The use of xmax+1 and ymax+1 ensure that you get no overlap.

Now, in the current problem, we are going to use negative value as well, because at the end we want to track Xi-Xj. So we need to manage x and -x, which gives {-xmax,xmax} as the values range. It is renormalised to become {0, 2 xmax}. Hence the fact that we have 2X and 2X*2Y=4XY, with X=xmax+1 and Y=ymax+1.

It is exactly the same thing you do when you say a 16 bit value represents two 8 bits values (with w=x+256*y).

Also L comes from the fact that we cannot use negative powers in the polynom. So instead of -x we use L-x. Again it is the same thing you we when you manage negative integers, -1=0x1111111111111111…=2^N-1 (with N =16, 32… depending of your language / CPU…)

#7

@sarvagya3943

Function pow(something, 4) works very long. If you rewrite next line in your code, you’ll get 15 pts.

sqrt(pow(x_d,4)+pow(y_d,4)+pow(z_d,4)); -> sqrt(x_d * x_d * x_d * x_d + y_d * y_d * y_d * y_d + z_d * z_d * z_d * z_d);

@eknoor292

Let’s solve problem for 1D, given N points with coordinates Xi(Xi >= 0 && Xi <= 1e5), how to know how many times vector (Xi - Xj, 0, 0) will be counted between all pairs (i, j), let’s construct two polynoms A and B:

MAX = max(X*)

A[X*] = 1

B[MAX - X*] = 1

If we multiply these polynoms:

C[i + j] = Sum (A* * B[j]) - Simple multiply (but with FFT not simple)

C[X* + (MAX - X[j])] = Sum (A[X*] * B[X[j]])

C[MAX + (X* - X[j])] = Sum (A[x*] * B[X[j]])

For 3D:
Let XMAX = max(X*), YMAX = max(Y*), ZMAX = max(Z*)
So construct A and B:

|X* - X[j]| < XMAX

|Y* - Y[j]| < YMAX

|Z* - Z[j]] < ZMAX

A[X* + 2 * XMAX * Y* + (2 * XMAX) * (2 * YMAX) * Z*] = 1

B[(XMAX - X*) + 2 * XMAX * (YMAX - Y*) + (2 * XMAX) * (2 * YMAX) * (ZMAX - Z*)] = 1

C[i + j] = Sum(A* * B[j])

C[(X* + 2 * XMAX * Y* + (2 * XMAX) * (2 * YMAX) * Z*) +

((XMAX - X[j]) + 2 * XMAX * (YMAX - Y[j]) + (2 * XMAX) * (2 * YMAX) * (ZMAX - Z[j]))] =

C[XMAX + (X* - X[j]) + 2 * XMAX * (YMAX + (Y* - Y[j])) + 4 * XMAX * YMAX * (ZMAX + (Z* - Z[j])] =

A[(X* + 2 * XMAX * Y* + (2 * XMAX) * (2 * YMAX) * Z*)] *

B[((XMAX - X[j]) + 2 * XMAX * (YMAX - Y[j]) + (2 * XMAX) * (2 * YMAX) * (ZMAX - Z[j]))]

XMAX + (X* - X[j]) > 0 && XMAX + (X* - X[j]) < 2 * XMAX

YMAX + (Y* - Y[j]) > 0 && YMAX + (Y* - Y[j]) < 2 * YMAX

ZMAX + (Z* - Z[j]) > 0 && ZMAX + (Z* - Z[j]) < 2 * ZMAX

Optimized solutions with 3 FFT/NTT could get AC.
No one has written solution with 2 FFT:)

But you can see that optimized FFT works much faster that NTT.

#8

@mgch Thank you for the explanation. I understood the part with one dimension. But would it be wrong to choose A[xi + yi + zi] and B[MaxX - xi + MaxY - yi + MaxZ - zi] for 3d. What is the significance of 2MaxX and 2MaxX2MaxY here? It has probably got to do with the total possible combinations but can you elaborate a little?

#9

@eknoor292: It would be wrong because then all the triples (xi, yi, zi) with the same sum would be mapped to the same index in the array. E.g. (1,2,3), (1,1,4), (3,1,2) would all be mapped to 6 because they all have the same sum. The point of the mapping described in the editorial is to map each different triple (xi, yi, zi) to a different index in the array.

#10

Thanks a ton! The concept is much clearer in my head now. I wonder why it did not strike me before #11

@mugurelionut : Ah! that makes sense. Thanks a lot.

#12

Eliminate the calls to pow. Either use simple arithmetic as in the solution given above (will make your code somewhat faster). Or even better use a lookup table for the 76 necessary values and precompute it outside the loop.