PROBLEM LINK:
Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4
Author: raysh07
Tester: sushil2006
Editorialist: iceknight1093
DIFFICULTY:
Medium
PREREQUISITES:
Combinatorics, NTT, divide and conquer
PROBLEM:
Given N and M, count the number of arrays A of length N with elements in [0, N] such that no subarray of A has its MEX equal to M.
In this version of the problem, M \le N \le 10^5.
EXPLANATION:
From the easy version of the problem, recall the the idea behind the solution is to ensure that between any two consecutive occurrences of M in the array, the MEX is not M.
To count this, we had two stages to the solution:
- First, for each i = 0, 1, 2, \ldots, N, we computed the number of arrays of length i that don’t contain M and have MEX strictly less than M.
This was the value w_i.- To do this, we utilized the auxiliary table ways[i][j], the number of arrays of length i without M that contain exactly j distinct elements smaller than M.
- The complexity here was \mathcal{O}(NM).
- Then, using this, we computed dp_i = \sum_{j \lt i} dp_j \cdot w_{i-j-1}.
The final answer was dp_{N+1}.- The complexity here was \mathcal{O}(N^2).
Now that N, M \le 10^5, both parts are slow and require optimization.
One of them is “easy” to optimize, that being the eventual dp computation.
Assume that we (somehow) compute the w array quickly.
Then, we want to find
Note that this looks quite like a convolution of the arrays dp and w, which normally can be computed quickly with FFT/NTT.
However, the issue here is that entries of dp are themselves being computed on-the-fly via previous entries; so we can’t exactly apply FFT (which doesn’t work left-to-right.)
However, it’s indeed possible to compute this dp quickly enough, using the technique of “online FFT” - which is really just a divide-and-conquer algorithm (the technique is often referred to as “CDQ Divide and Conquer”.)
For more details, read this blog on the topic.
To summarize, once the array w is known, the array dp can be computed in \mathcal{O}(N\log^2 N).
So, we shift focus to computing w.
We want w_i = number of arrays of length i containing elements in [0, M-1] \cup [M+1, N] such that at least one element \lt M is missing.
To compute this, we’ll instead count arrays that contain every element in [0, M-1] and subtract that from the total.
Since the values in [M+1, N] are extraneous, let’s ignore them for now - we’ll bring them back later. Instead, we focus on the values [0, M-1] only.
Define f(x, M) to be the number of arrays of length x with elements in [0, M-1] that contain every element from 0 to M-1 at least once.
To compute f(x, M), we use divide and conquer on values.
Let H = \left\lfloor \frac{M}{2} \right\rfloor.
We’ll place the elements [0, H] and [H+1, M-1] separately.
Suppose the elements in [0, H] occupy i positions out of x. Then,
- There are \binom{x}{i} ways to choose these i positions.
- There are f(i, H+1) ways to arrange the elements [0, H] into these positions while ensuring everything appears at least once.
- There are f(x-i, M-H) ways to arrange the elements [H+1, M-1] into the remaining positions.
Since i can be anything, we obtain
Expanding the binomial coefficient, this becomes
Observe that this is just a convolution of the f(\cdot, H+1) and f(\cdot, M-H) functions.
If those are computed recursively, we can thus compute f(x, M) for all x quickly using NTT to compute their convolution.
This gives us an algorithm that runs in \mathcal{O}(N\log N\log M) time.
There are other methods as well, though (to the editorialist’s knowledge) all requiring convolutions.
For example, the values we want to compute are closely related to the Stirling numbers of the second kind - see Library Checker for computing those.
Now that f(x, M) is known, let’s define g(x) to be the number of arrays of length x that:
- Have elements in [0, M-1] \cup [M+1, N],
- Contain at least one occurrence of each of 0, 1, \ldots, M-1.
g(x) can be computed using a similar idea to how we setup a recurrence for f(x, M).
Let’s fix i to be the number of indices containing values \gt M.
Then,
- There are \binom{x}{i} ways to choose the indices.
- There are (N-M)^i ways to choose the values at these indices.
- There are f(x-i, M) ways to arrange the elements [0, M-1] at the remaining indices.
So,
This is again a convolution, so once f(\cdot, M) is know we can compute g(x) in \mathcal{O}(N\log N) time.
Finally, since g(x) is the number of “bad” arrays of length x, we have
since there are N^x total possible arrays of length x that don’t include M.
Once all the w_x values are known, we can compute the final dp array using divide and conquer/“online FFT” as described at the beginning, thus giving the final answer.
TIME COMPLEXITY:
\mathcal{O}(N\log N\log M) per testcase.
CODE:
Author's code (C++)
Coming soon.