PROBLEM LINK:
Author: Bogdan Ciobanu
Tester: Triveni Mahatha
Editorialist: Adarsh Kumar
DIFFICULTY:
Hard
PREREQUISITES:
Number Theoretic Transform, Combinatorics
PROBLEM:
You are given an array of length N. Initially, each element of array is painted with color 0. You are given K turns. In i^{th} turn you can paint any non-empty subarray with color i, while overwriting any color already present. You need to answer how many distinct colorings of the final array are possible modulo 163577857.
EXPLANATION:
Lets try to find the expression for F(K,N). Here F(K,N) will represent the answer to our given problem.
For doing the same, we will iterate over number of visible colors at the end. Say, number of visible colors are r in final array. Then, the number of ways of choosing r colors from K given colors will be \binom {K-1}{r-1} , because you need to select only r-1 colors from K-1 colors, as K^{th} color will always be visible.
Hence, we can write:
F(K,N) = \sum_{r=1}^{K} \binom {K-1}{r-1} e(r,N)
where, e(r,N) represents number of ways in which r colors can be visible finally in array of length N.
Finding an expression for e(r,N)
Lets try to think in reverse manner. Say r^{th} color occupy an interval of length x_1. Number of ways of doing the same will be N-x_1+1. Now, say (r-1)^{th} color occupy an interval of length x_2 in remaining N-x_1 places. You can notice that x_1 wont affect x_2. Basically, you can write:
e(r,N) = \sum_{x_1=1}^{N-r+1} (N-x_1+1) * e(r-1,N-x_1)
Try to expand the formula more and you will end up here,
e(r,N) = \sum \limits_{x_1=1} \sum \limits_{x_2=1}... \sum \limits_{x_r=1} (N-x_1+1)(N-x_1-x_2+1)...(N-x_1-x_2..-x_r+1)
If you observe it closely, you will find that it is nothing but sum of product of numbers taken r at a time from the set \{ 1,2,3,....N \}. Hence, we can say that e(r,N) is coefficient of x^{n-r} in polynomial P_N(x) where:
P_N(x) = (x+1)(x+2)(x+3).....(x+N)
All we are left is now computing the polynomial P_N(x) and we will be done.
Computation of P_N(x)
subtask #1 and #2
One can compute the product naively using brute force in order to pass these subtasks in O(N^2).
subtask #3
This subtask can be passed with an O(nlog^2n) solution using divide and conquer, along with NTT.
The task here is to compute the product of n polynomials of the form (x+i). Let A(x) be the product of the first half of (x+i)'s, and B(x) be the product of rest (x+i)'s. We compute A(x) and B(x) recursively, and then compute A(x)⋅B(x) by NTT. The time complexity is T(n)=2T(n/2)+O(nlogn), so T(n)=O(nlog^2n). This idea is illustrated by the following pseudocode:
MUL(L,R) // computes (x+L)(x+L+1)...(x+R)
if (L==R) then return (x+L)
mid = (L+R)/2
A = MUL(L,mid)
B = MUL(mid+1,R)
return A*B //A*B is computed by FFT
subtask #4
We will try to devise an O(nlogn) solution because O(nlog^2n) wont pass here because of strict Time Limit.
Informal way down to the solution
Let’s take a closer look at time complexity of divide and conquer solution:
T(n) = T(left_{half}) + T(right_{half}) + O(merge)
Since merge here is nlogn, T(n) comes out to be O(nlog^2n). But what if we can somehow compute right_{half} using left_{half} in O(merge) too. This way the recurrence relation will be:
T(n) = T(left_{half}) + O(merge) + O(merge)
\Rightarrow T(n) = T(n/2) + O(nlogn)
which eventually results in T(n) = O(nlogn) with some constant factor. So we are left with computing right_{half} using left_{half}.
Formally,
We can compute P_{2N}(x) using the identity:
P_{2N}(x) = P_{N}(x) * P_{N}(x+N)
Main problem here is, finding P_{N}(x+N) when P_{N}(x) is known to you. Say,
P_N(x) = \prod \limits_{i=1}^N (x+i) = \sum_{i=0}^N c_i.x^i
where all c_i's are known to you. Then,
P_N(x+N) = \prod \limits_{i=1}^N (x+N+i) = \sum_{i=0}^N c_i.(x+N)^i
\Rightarrow P_N(x+N) = \sum_{i=0}^N h_i.x^i
where,
h_i = \sum \limits_{j=i}^N c_j . \binom{j}{i} . N^{j-i}
\Rightarrow h_i = \frac{1}{i!} \sum \limits_{j=i}^N (c_j.j!)\left (\frac{n^{j-i}}{(j-i)!} \right )
$\Rightarrow h_i = \frac{1}{i!} . ($coefficient of x^{N-i} in A(x)B(x))
where,
A(x) = \sum \limits_{i=0}^{N} (c_{N-i}.(N-i)!) . x^i, and B(x) = \sum \limits_{i=0}^{N} \left( \frac {N^i}{i!} \right) . x^i
Now we have an algorithm to find P_N(x+N) in O(NlogN) using convolution of A(x) and B(x) when P_N(x) is known to us. Now let’s take a look at the pseudocode to solve this subtask:
MUL(N) // computes (x+1)(x+2)...(x+N) in O(NlogN)
if N==1:
return (x+1)
C = MUL(N/2)
H = convolute(A,B) // use C to obtain A
ANS = convolute(C,H)
if N is odd:
ANS *= (x+N) // naive multiplication will do - O(N)
return ANS
Only thing, that is left out is how to multiply two polynomials modulo 163577857. Notice that 163577857 = 39.2^{22} + 1, which is NTT friendly. This reduces pain for us. For those who are not familiar with NTT may refer this link.
Optimizations
When you implement this algorithm naively, you might notice that it takes too much time to process an input for N=10^6. Unfortunately, the constant hidden in the O notation seems to be too high! The time complexity is correct, so we will need additional tricks to cut down this constant and be able to squeeze the solution into the time limit. Here are a few optimizations:
- Try reducing the mod operation as much as possible everywhere.
- Use array instead of vector for NTT. Basically, use a fast NTT code.
- Memory usage: Try not to use excess memory as it slows down the code. A runtime difference of upto 0.5s can incur between 130M and 70M, depending on other parameters.
- In addition to these, you may need to optimize your code in other ways like changing the data type if it still doesn't pass the time limit.
Alternate Solution
This solution was not intended. However, 40\% of the solvers used this trick. The idea is to use the solution for subtask #3 to pass subtask #4 too. A naive implementation of subtask #3 for all the test cases leads to O(T.N.log^2N) solution. If we can remove the T part and do some optimization we will be able to pass subtask #4 too.
How to remove the T part? Let’s say that value of N over different test cases is represented by N_1, N_2, N_3, ..., N_T. We will take all input at once and sort the values of N in ascending order. Let n_i's represent the sorted sequence of N now. We will now call MUL(1,n_1), MUL(n_1+1,n_2), … , MUL(n_{T-1}+1,n_T). Now, we can use prefix multiplication of these terms to find the answer for all test cases. The time complexity of the same can be found approximately using following sum:
O(N_1log^2N) + O((N_2-N_1)log^2N) + ..... + O((N_T-N_{T-1})log^2N) = O(Nlog^2N)
Time Complexity:
O(T.N.logN) or O(N.log^2N)