PERMFILLM - Editorial

PROBLEM LINK:

Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4

Author: raysh07
Tester: iceknight1093
Editorialist: iceknight1093

DIFFICULTY:

Medium

PREREQUISITES:

Dynamic programming

PROBLEM:

You’re given a partially filled permutation Q.
Across all ways to complete Q and obtain a permutation P, find the maximum possible value of

\sum_{i=1}^N \sum_{j=i}^N \text{MEX}(P_i, P_{i+1}, \ldots, P_j)

EXPLANATION:

To begin, we first need to transform the computation of sum-over-subarray-MEXes into a form that’s nicer to work with.

One way to think of the MEX of a set S is as follows.

  • Start with MEX = 0.
  • Then, for each integer k \ge 0,
    • If S contains all the elements 0, 1, 2, \ldots, k, then increase MEX by 1.

It should be obvious that this computes \text{MEX}(S) correctly.

Now, keeping this viewpoint in mind, let’s define c_i to be the number of subarrays of P that contain all the values \{0, 1, \ldots, i\}.
Note that we don’t care whether the subarray contains values greater than i; as long as it contains everything till i it will add to c_i.
In particular this means subarrays can contribute to multiple c_i; for example the subarray [0, 4, 1, 2] would contribute to all three of c_0, c_1, c_2.

Observe that with this definition, the sum of MEX-es across all subarrays is simply

c_0 + c_1 + c_2 + \ldots + c_{N-1}

This is simply because a subarray with MEX M will always add 1 to exactly all of c_0, c_1, \ldots, c_{M-1}.

This viewpoint is particularly useful because computing the number of subarrays containing all of 0, 1, 2, \ldots, i (without regard for higher elements) is actually quite easy:

  • Let L_i = \min(\text{pos}_0, \text{pos}_1, \ldots, \text{pos}_i)
    That is, L_i is the index of the leftmost occurring element among 0, 1, \ldots, i
  • Let R_i = \max(\text{pos}_0, \text{pos}_1, \ldots, \text{pos}_i)
    That is, R_i is the index of the right occurring element among 0, 1, \ldots, i
  • Then, any subarray that contains both indices L_i and R_i will contain all required values; and vice versa.
    The number of such subarrays is hence just L_i \times (N+1-R_i).
    (This is using 1-indexing).

In particular, note that these minimal subarrays form a chain under the containment relation, i.e. the minimal subarray for i will always be contained inside the minimal subarray for i+1.


Now, our aim is to fill in Q to maximize the sum of subarray MEX-es.

Intuitively, it makes sense to cluster smaller values near each other, so that they can all be included in a larger number of subarrays (hence increasing potential contribution to the sum).
In particular, we can make the following observation:

  • Let i_1, i_2, \ldots, i_k be the indices that are empty in Q, from left to right.
  • Let x_1, x_2, \ldots, x_k be the values that are missing from Q, from small to large.
  • Then, there exists an optimal solution having the following property:
    • For every j (1 \le j \le k), the values x_1, x_2, \ldots, x_j will be placed in some contiguous segment of empty indices.
      That is, there will exist integers l, r such that r-l+1 = j and the values x_1, \ldots, x_j are placed (in some order) at indices i_l, i_{l+1}, \ldots, i_r.

It is not very hard to prove this - it’s trivially true for j = 1, and then if it’s true for j but not for j+1, one can show with an exchange argument that bringing the value x_{j+1} closer to the current segment containing x_1, \ldots, x_j can only improve (or at least not worsen) the answer; so doing it will keep an optimal solution optimal, giving us what we need.

We can use this information to design a dynamic programming solution.

Define dp[L][R] to be the maximum possible answer considering only the contributions of subarrays lying in [L, R], assuming all -1's in this range have been filled up with some prefix of missing elements (in some order).

We now need to figure out transitions.

Let’s first look at the segment [L, R] itself.
Since we’re definitely replacing all -1's in this segment with some prefix of missing elements, even though we don’t know their order yet, the set of elements in this segment is known to us.
Let’s define M_{L, R} to be the largest integer such that all of 0, 1, 2, \ldots, M_{L, R} appear in [L, R] after appropriate replacement.
Note that [L, R] can now be the “minimal segment” for only some values \le M_{L, R}.

Some smaller values might have minimal segments that are properly contained in [L, R], so we don’t want to count these.
However, note that due to the containment property of the minimal segments, either all smaller minimal segments are entirely contained in [L, R-1], or all smaller minimal segments are entirely contained in [L+1, R].

So, suppose all of the smaller ones are contained in [L, R-1].
Then,

  • The contribution of the range [L, R-1] along with segments within it is, by definition, dp[L][R-1].
  • We then only need to care about the contribution of the segment [L, R] itself.
  • For this, observe that all values \le M_{L, R-1} will have their minimal segments handled within [L, R-1] anyway.
    So, only the values M_{L, R-1} + 1, \ldots, M_{L, R} will have their minimal segments be [L, R].
  • Each such value has a contribution of L\cdot (N+1-R).
  • Thus, we obtain an overall value of dp[L][R-1] + L\cdot (N+1-R)\cdot (M_{L, R} - M_{L, R-1}).

The same reasoning applies to assuming the subrange [L+1, R] instead, and similar computation gives us a cost of
dp[L+1][R] + L\cdot (N+1-R)\cdot (M_{L, R} - M_{L+1, R})
instead.

dp[L][R] is then just the maximum of these two values.


Observe that as long as we know the M_{L, R} values, the above solution runs in \mathcal{O}(N^2) time since we have quadratic states and constant-time transitions from each.

All the M_{L, R} values can themselves be precomputed in quadratic time too.
To do this, fix a value of L, and then compute the value for all R \ge L by processing them in ascending order.
This is relatively simple: keep an array marking which elements have been used already; and when processing index R,

  • If Q_R \ge 0, mark Q_R as used.
  • If Q_R = -1, mark the next smallest missing element as used instead.
  • Then, as long the current answer is marked, increase it.

This two-pointer algorithm runs in \mathcal{O}(N) for a fixed L, since the answer is non-decreasing with R and also doesn’t exceed N.

TIME COMPLEXITY:

\mathcal{O}(N^2) per testcase.

CODE:

Editorialist's code (C++)
// #include <bits/allocator.h>
// #pragma GCC optimize("O3,unroll-loops")
// #pragma GCC target("avx2,bmi,bmi2,lzcnt,popcnt")
#include "bits/stdc++.h"
using namespace std;
using ll = long long int;
mt19937_64 RNG(chrono::high_resolution_clock::now().time_since_epoch().count());

const int N = 5005;
ll dp[N][N];
int mex[N][N];

int main()
{
    ios::sync_with_stdio(false); cin.tie(0);

    int t; cin >> t;
    while (t--) {
        int n; cin >> n;
        vector q(n, 0);
        for (int &x : q) cin >> x;
        
        vector pos(n+1, -1);
        for (int i = 0; i < n; ++i) {
            if (q[i] >= 0) pos[q[i]] = i;
        }

        vector<int> unused;
        for (int i = 0; i < n; ++i) {
            if (pos[i] == -1) unused.push_back(i);
        }

        vector mark(n+1, 0);
        for (int i = 0; i < n; ++i) {
            int ptr = 0;
            mark.assign(n+1, 0);
            mex[i][i] = 0;
            for (int j = i; j < n; ++j) {
                if (q[j] == -1) {
                    mark[unused[ptr]] = 1;
                    ++ptr;
                }
                else mark[q[j]] = 1;

                if (j > i) mex[i][j] = mex[i][j-1];
                while (mark[mex[i][j]]) ++mex[i][j];
            }
            dp[i][i] = 0;
            if (q[i] == 0 or (q[i] == -1 and unused[0] == 0)) dp[i][i] = (i+1)*(n-i);
        }

        for (int len = 2; len <= n; ++len) {
            for (int L = 0; L+len <= n; ++L) {
                int R = L+len-1;
                dp[L][R] = max(dp[L+1][R] + 1ll*(L+1)*(n-R)*(mex[L][R] - mex[L+1][R]), dp[L][R-1] + 1ll*(L+1)*(n-R)*(mex[L][R] - mex[L][R-1]));
            }
        }
        cout << dp[0][n-1] << '\n';
    }
}
Author's code (C++)
#include <bits/stdc++.h>
using namespace std;
#define int long long
#define INF (int)1e18

mt19937_64 RNG(chrono::steady_clock::now().time_since_epoch().count());

void Solve() 
{
    int n; cin >> n;
    
    vector <int> p(n), pos(n, -1);
    for (int i = 0; i < n; i++){
        cin >> p[i];
        if (p[i] != -1) pos[p[i]] = i;
    }
    
    int m = 0;
    vector <int> loc;
    for (int i = 0; i < n; i++) if (p[i] == -1){
        m++;
        loc.push_back(i);
    }
    
    int base = 0;
    int L = n - 1, R = 0;
    int st = 0;
    for (int i = 0; i < n; i++){
        if (pos[i] == -1) break;
        ++st;
        L = min(L, pos[i]);
        R = max(R, pos[i]);
        
        base += (L + 1) * (n - R);
    }
    
    if (m == 0){
        cout << base << "\n";
        return;
    }
    
    vector dp(m, vector<int>(m, 0));
    for (int i = 0; i < m; i++){
        int l = min(L, loc[i]);
        int r = max(R, loc[i]);
        dp[i][i] = base + (l + 1) * (n - r);
    }
    int cur = 1;
    
    for (int i = st + 1; i < n; i++){
        if (pos[i] == -1){
            cur += 1;
            for (int i = 0; i + cur - 1 < m; i++){
                int j = i + cur - 1;
                int l = min(L, loc[i]);
                int r = max(R, loc[j]);
                dp[i][j] = max(dp[i + 1][j], dp[i][j - 1]) + (l + 1) * (n - r);
            }
        } else {
            L = min(L, pos[i]);
            R = max(R, pos[i]);
            for (int i = 0; i + cur - 1 < m; i++){
                int j = i + cur - 1;
                int l = min(L, loc[i]);
                int r = max(R, loc[j]);
                dp[i][j] += (l + 1) * (n - r);
            }
        }
    }
    
    cout << dp[0][m - 1] << "\n";
}

int32_t main() 
{
    auto begin = std::chrono::high_resolution_clock::now();
    ios_base::sync_with_stdio(0);
    cin.tie(0);
    int t = 1;
    // freopen("in",  "r", stdin);
    // freopen("out", "w", stdout);
    
    cin >> t;
    for(int i = 1; i <= t; i++) 
    {
        //cout << "Case #" << i << ": ";
        Solve();
    }
    auto end = std::chrono::high_resolution_clock::now();
    auto elapsed = std::chrono::duration_cast<std::chrono::nanoseconds>(end - begin);
    cerr << "Time measured: " << elapsed.count() * 1e-9 << " seconds.\n"; 
    return 0;
}