PROBLEM LINK:
Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4
Author: raysh07
Tester: iceknight1093
Editorialist: iceknight1093
DIFFICULTY:
Medium
PREREQUISITES:
Dynamic programming
PROBLEM:
You’re given a partially filled permutation Q.
Across all ways to complete Q and obtain a permutation P, find the maximum possible value of
EXPLANATION:
To begin, we first need to transform the computation of sum-over-subarray-MEXes into a form that’s nicer to work with.
One way to think of the MEX of a set S is as follows.
- Start with MEX = 0.
- Then, for each integer k \ge 0,
- If S contains all the elements 0, 1, 2, \ldots, k, then increase MEX by 1.
It should be obvious that this computes \text{MEX}(S) correctly.
Now, keeping this viewpoint in mind, let’s define c_i to be the number of subarrays of P that contain all the values \{0, 1, \ldots, i\}.
Note that we don’t care whether the subarray contains values greater than i; as long as it contains everything till i it will add to c_i.
In particular this means subarrays can contribute to multiple c_i; for example the subarray [0, 4, 1, 2] would contribute to all three of c_0, c_1, c_2.
Observe that with this definition, the sum of MEX-es across all subarrays is simply
This is simply because a subarray with MEX M will always add 1 to exactly all of c_0, c_1, \ldots, c_{M-1}.
This viewpoint is particularly useful because computing the number of subarrays containing all of 0, 1, 2, \ldots, i (without regard for higher elements) is actually quite easy:
- Let L_i = \min(\text{pos}_0, \text{pos}_1, \ldots, \text{pos}_i)
That is, L_i is the index of the leftmost occurring element among 0, 1, \ldots, i - Let R_i = \max(\text{pos}_0, \text{pos}_1, \ldots, \text{pos}_i)
That is, R_i is the index of the right occurring element among 0, 1, \ldots, i - Then, any subarray that contains both indices L_i and R_i will contain all required values; and vice versa.
The number of such subarrays is hence just L_i \times (N+1-R_i).
(This is using 1-indexing).
In particular, note that these minimal subarrays form a chain under the containment relation, i.e. the minimal subarray for i will always be contained inside the minimal subarray for i+1.
Now, our aim is to fill in Q to maximize the sum of subarray MEX-es.
Intuitively, it makes sense to cluster smaller values near each other, so that they can all be included in a larger number of subarrays (hence increasing potential contribution to the sum).
In particular, we can make the following observation:
- Let i_1, i_2, \ldots, i_k be the indices that are empty in Q, from left to right.
- Let x_1, x_2, \ldots, x_k be the values that are missing from Q, from small to large.
- Then, there exists an optimal solution having the following property:
- For every j (1 \le j \le k), the values x_1, x_2, \ldots, x_j will be placed in some contiguous segment of empty indices.
That is, there will exist integers l, r such that r-l+1 = j and the values x_1, \ldots, x_j are placed (in some order) at indices i_l, i_{l+1}, \ldots, i_r.
- For every j (1 \le j \le k), the values x_1, x_2, \ldots, x_j will be placed in some contiguous segment of empty indices.
It is not very hard to prove this - it’s trivially true for j = 1, and then if it’s true for j but not for j+1, one can show with an exchange argument that bringing the value x_{j+1} closer to the current segment containing x_1, \ldots, x_j can only improve (or at least not worsen) the answer; so doing it will keep an optimal solution optimal, giving us what we need.
We can use this information to design a dynamic programming solution.
Define dp[L][R] to be the maximum possible answer considering only the contributions of subarrays lying in [L, R], assuming all -1's in this range have been filled up with some prefix of missing elements (in some order).
We now need to figure out transitions.
Let’s first look at the segment [L, R] itself.
Since we’re definitely replacing all -1's in this segment with some prefix of missing elements, even though we don’t know their order yet, the set of elements in this segment is known to us.
Let’s define M_{L, R} to be the largest integer such that all of 0, 1, 2, \ldots, M_{L, R} appear in [L, R] after appropriate replacement.
Note that [L, R] can now be the “minimal segment” for only some values \le M_{L, R}.
Some smaller values might have minimal segments that are properly contained in [L, R], so we don’t want to count these.
However, note that due to the containment property of the minimal segments, either all smaller minimal segments are entirely contained in [L, R-1], or all smaller minimal segments are entirely contained in [L+1, R].
So, suppose all of the smaller ones are contained in [L, R-1].
Then,
- The contribution of the range [L, R-1] along with segments within it is, by definition, dp[L][R-1].
- We then only need to care about the contribution of the segment [L, R] itself.
- For this, observe that all values \le M_{L, R-1} will have their minimal segments handled within [L, R-1] anyway.
So, only the values M_{L, R-1} + 1, \ldots, M_{L, R} will have their minimal segments be [L, R]. - Each such value has a contribution of L\cdot (N+1-R).
- Thus, we obtain an overall value of dp[L][R-1] + L\cdot (N+1-R)\cdot (M_{L, R} - M_{L, R-1}).
The same reasoning applies to assuming the subrange [L+1, R] instead, and similar computation gives us a cost of
dp[L+1][R] + L\cdot (N+1-R)\cdot (M_{L, R} - M_{L+1, R})
instead.
dp[L][R] is then just the maximum of these two values.
Observe that as long as we know the M_{L, R} values, the above solution runs in \mathcal{O}(N^2) time since we have quadratic states and constant-time transitions from each.
All the M_{L, R} values can themselves be precomputed in quadratic time too.
To do this, fix a value of L, and then compute the value for all R \ge L by processing them in ascending order.
This is relatively simple: keep an array marking which elements have been used already; and when processing index R,
- If Q_R \ge 0, mark Q_R as used.
- If Q_R = -1, mark the next smallest missing element as used instead.
- Then, as long the current answer is marked, increase it.
This two-pointer algorithm runs in \mathcal{O}(N) for a fixed L, since the answer is non-decreasing with R and also doesn’t exceed N.
TIME COMPLEXITY:
\mathcal{O}(N^2) per testcase.
CODE:
Editorialist's code (C++)
// #include <bits/allocator.h>
// #pragma GCC optimize("O3,unroll-loops")
// #pragma GCC target("avx2,bmi,bmi2,lzcnt,popcnt")
#include "bits/stdc++.h"
using namespace std;
using ll = long long int;
mt19937_64 RNG(chrono::high_resolution_clock::now().time_since_epoch().count());
const int N = 5005;
ll dp[N][N];
int mex[N][N];
int main()
{
ios::sync_with_stdio(false); cin.tie(0);
int t; cin >> t;
while (t--) {
int n; cin >> n;
vector q(n, 0);
for (int &x : q) cin >> x;
vector pos(n+1, -1);
for (int i = 0; i < n; ++i) {
if (q[i] >= 0) pos[q[i]] = i;
}
vector<int> unused;
for (int i = 0; i < n; ++i) {
if (pos[i] == -1) unused.push_back(i);
}
vector mark(n+1, 0);
for (int i = 0; i < n; ++i) {
int ptr = 0;
mark.assign(n+1, 0);
mex[i][i] = 0;
for (int j = i; j < n; ++j) {
if (q[j] == -1) {
mark[unused[ptr]] = 1;
++ptr;
}
else mark[q[j]] = 1;
if (j > i) mex[i][j] = mex[i][j-1];
while (mark[mex[i][j]]) ++mex[i][j];
}
dp[i][i] = 0;
if (q[i] == 0 or (q[i] == -1 and unused[0] == 0)) dp[i][i] = (i+1)*(n-i);
}
for (int len = 2; len <= n; ++len) {
for (int L = 0; L+len <= n; ++L) {
int R = L+len-1;
dp[L][R] = max(dp[L+1][R] + 1ll*(L+1)*(n-R)*(mex[L][R] - mex[L+1][R]), dp[L][R-1] + 1ll*(L+1)*(n-R)*(mex[L][R] - mex[L][R-1]));
}
}
cout << dp[0][n-1] << '\n';
}
}
Author's code (C++)
#include <bits/stdc++.h>
using namespace std;
#define int long long
#define INF (int)1e18
mt19937_64 RNG(chrono::steady_clock::now().time_since_epoch().count());
void Solve()
{
int n; cin >> n;
vector <int> p(n), pos(n, -1);
for (int i = 0; i < n; i++){
cin >> p[i];
if (p[i] != -1) pos[p[i]] = i;
}
int m = 0;
vector <int> loc;
for (int i = 0; i < n; i++) if (p[i] == -1){
m++;
loc.push_back(i);
}
int base = 0;
int L = n - 1, R = 0;
int st = 0;
for (int i = 0; i < n; i++){
if (pos[i] == -1) break;
++st;
L = min(L, pos[i]);
R = max(R, pos[i]);
base += (L + 1) * (n - R);
}
if (m == 0){
cout << base << "\n";
return;
}
vector dp(m, vector<int>(m, 0));
for (int i = 0; i < m; i++){
int l = min(L, loc[i]);
int r = max(R, loc[i]);
dp[i][i] = base + (l + 1) * (n - r);
}
int cur = 1;
for (int i = st + 1; i < n; i++){
if (pos[i] == -1){
cur += 1;
for (int i = 0; i + cur - 1 < m; i++){
int j = i + cur - 1;
int l = min(L, loc[i]);
int r = max(R, loc[j]);
dp[i][j] = max(dp[i + 1][j], dp[i][j - 1]) + (l + 1) * (n - r);
}
} else {
L = min(L, pos[i]);
R = max(R, pos[i]);
for (int i = 0; i + cur - 1 < m; i++){
int j = i + cur - 1;
int l = min(L, loc[i]);
int r = max(R, loc[j]);
dp[i][j] += (l + 1) * (n - r);
}
}
}
cout << dp[0][m - 1] << "\n";
}
int32_t main()
{
auto begin = std::chrono::high_resolution_clock::now();
ios_base::sync_with_stdio(0);
cin.tie(0);
int t = 1;
// freopen("in", "r", stdin);
// freopen("out", "w", stdout);
cin >> t;
for(int i = 1; i <= t; i++)
{
//cout << "Case #" << i << ": ";
Solve();
}
auto end = std::chrono::high_resolution_clock::now();
auto elapsed = std::chrono::duration_cast<std::chrono::nanoseconds>(end - begin);
cerr << "Time measured: " << elapsed.count() * 1e-9 << " seconds.\n";
return 0;
}