SUBADDLXM0 - Editorial

iceknight1093 · May 20, 2026, 4:30pm

PROBLEM LINK:

Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4

Author: iceknight1093
Tester: raysh07
Editorialist: iceknight1093

DIFFICULTY:

Medium

PREREQUISITES:

Binary search

PROBLEM:

You’re given an array A. You can partition it into some M \ge 1 non-empty subarrays, then add 0, 1, 2, \ldots, M-1 to the elements of each of these subarrays in some order (all elements in the same subarray receive the same added value), and then reduce all elements modulo K.

Find the lexicographically smallest final array.

Here, N \le 2000.

EXPLANATION:

The constraints in the easy version suggest that some sort of quadratic solution should work.

First, let’s look at some value A_i.
Suppose we want to make A_i equal to r in the end.
For this, we need to add the value (r - A_i) \bmod K here.
If x denotes this value, then all of the labels x, x+K, x+2K, \ldots are valid choices to assign to index 1.

However, note that if we want to add some value x to an index, we definitely need to use the values 0, 1, 2, \ldots, x-1 at some indices too - we cannot skip any of them.
So, it’s clearly better to use the smallest one possible - that is, use x if possible; otherwise use x+K, and so on.

Now, clearly at index 1, if x \ge N then it’s impossible to use the value x; because we only have N-1 elements remaining but we need to use up values 0, 1, 2, \ldots, N-1 among them which is impossible.
This can be generalized: suppose we’re at index i, with the maximum value used being P and among the values in [0, P], Q of them are unused.
Then, we definitely need Q \le N-i, since there are only N-i elements to the right of this index.

Lexicographic minimization is, by definition, done by greedily minimizing from left to right.

So, suppose we’ve processed indices 1, 2, \ldots, i-1, and now we’re looking at index i.
Let P denote the maximum value used so far.
For each remainder r (0 \le r \lt K), let’s also maintain the next-highest unused value of this remainder, say as nxt[r].

Now, suppose we fix y to be the final value of A_i.
For this, we need to add a value that’s (A_i - y) \bmod K to this index.
Let r = (A_i - y) \bmod K.
There are now two possibilities:

Index i-1 did not use the remainder r.
Here, we’re forced to then start a new subarray with value r.
The smallest value that can be used to do this is, by definition, nxt[r].
Note that this is only possible if there are enough remaining elements to the right to use up all unused values \le nxt[r], which is a simple inequality check.
Index i-1 did use the remainder r.
This is the more interesting case, because we have two options: we can either start a new subarray with value nxt[r], or we can extend the previous subarray.
At a glance, it might seem that it’s always optimal to extend the previous subarray, but this is actually not true - in some cases it’s optimal to start a new subarray instead.

To deal with the latter case, we instead allow for some slack.

Let’s define two quantities lo[x] and hi[x], where:

hi[x] is the number of elements so far that need a remainder of x added to them, in the optimal solution.
lo[x] is the number of blocks of elements so far that need a remainder of x added to them, in the optimal solution.

Essentially, lo[x] is what happens if we always extend a subarray when possible; and hi[x] is what happens if we never extend subarrays (unless forced to by values going too high.)

Observe that under this model, we have some “slack”, of hi[x] - lo[x] for each value x; because we can always use anywhere in [lo[x], hi[x]] values with remainder x.

Note also that with the lo[x] and hi[x] arrays, we no longer need to maintain nxt[x] - that information comes implicitly with lo and hi.
In particular, the set of values with remainder x that are available to use are exactly x, x+K, \ldots, K\cdot hi[x] + x.

Using this, let’s go back to the two cases we had.
Recall that we fixed the remainder r.

Let’s see what happens if we try to start a new subarray at index i.
We will definitely use a value that’s at least K\cdot lo[r] + r here.
So, if the previous forced maximum element was P, the new forced maximum will be \max(P, K\cdot lo[r] + r).
Let this new forced maximum be P'.
Also, the largest value with remainder r available to us is K\cdot hi[r] + r, which we can include if we want to.

We want to check if this new maximum P' is feasible or not.

To do that, note that for any remainder x, as noted before, the available values are x, x+K, \ldots, K\cdot hi[x] + x.
However, in the interest of checking feasibility of P' , we only care about the ones among these that are \le P'; we can’t use larger ones.
Note that each such label corresponds to one element that requires remainder x.
So, each label larger than P' corresponds to an element that must be in the same subarray as its previous element.

Let Q denote the number of such “forced” elements.
(We’ll come back to computing Q later.)
Then, it can be observed that P' is valid if and only if P' + Q + 1 \le N; because we have P'+1 values that need to be assigned (0, 1, \ldots, P') but only N-Q potential subarrays; because of the N elements we already saw that Q of them are forced to merge with their predecessors.

So, if P' + Q + 1 \le N then we’re good; otherwise we’re not.

The case of not starting a subarray at index i is similar; just that we don’t have a new forced maximum since we don’t need to use K\cdot lo[r] + r, so we just have P = P' instead.

The optimal choice is then whichever is the smallest value of A_i that allows for this condition to be satisfied.

Once the optimal value r is known, update lo[r] and hi[r] appropriately.

Finally, let’s circle back to computing the value Q quickly.
Let’s create a sorted list of all elements of the form x, x+K, \ldots, K\cdot hi[x] + x for all x so far.
If we had this, finding Q could be done with a simple binary search on the list.

It’s not hard to see that because we are allowed quadratic time, we can indeed maintain this list: after deciding the optimal value of A_i, simply insert the appropriate value of K\cdot hi[r] + r into the sorted list while keeping it sorted (which can be done in linear time, and happens N times in total.)

TIME COMPLEXITY:

\mathcal{O}(N^2 \log N) per testcase.

CODE:

Editorialist's code (C++)

// #include <bits/allocator.h>
// #pragma GCC optimize("O3,unroll-loops")
// #pragma GCC target("avx2,bmi,bmi2,lzcnt,popcnt")
#include "bits/stdc++.h"
using namespace std;
using ll = long long int;
mt19937_64 RNG(chrono::high_resolution_clock::now().time_since_epoch().count());

int main()
{
    ios::sync_with_stdio(false); cin.tie(0);

    int t; cin >> t;
    while (t--) {
        int n, k; cin >> n >> k;
        vector a(n, 0);
        for (int &x : a) cin >> x;

        vector lo(k, 0), hi(k, 0);
        vector<int> used;
        int prv = -1, mx = -1;
        for (int i = 0; i < n; ++i) {
            int ans_r = -1;
            for (int r = 0; r < k; ++r) {
                bool start = (r != prv);
                int p = mx;
                if (start) p = max(p, lo[r]*k + r);

                auto it = ranges::upper_bound(used, p);
                int extra = end(used) - it;
                if (hi[r]*k + r > p) ++extra;

                if (p + extra + 1 <= n) {
                    if (ans_r == -1 or (a[i] + r) % k < (a[i] + ans_r) % k) ans_r = r;
                }
            }

            a[i] = (a[i] + ans_r) % k;
            bool start = (ans_r != prv);
            if (start) {
                mx = max(mx, lo[ans_r]*k + ans_r);
                ++lo[ans_r];
            }

            used.push_back(hi[ans_r]*k + ans_r);
            ranges::sort(used);
            ++hi[ans_r];
            prv = ans_r;
        }

        for (int x : a) cout << x << ' ';
        cout << '\n';
    }
}