MAXSUBSCR - Editorial

iceknight1093 · September 10, 2025, 4:30pm

PROBLEM LINK:

Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4

Author: iceknight1093
Tester: sushil2006
Editorialist: iceknight1093

DIFFICULTY:

Medium

PREREQUISITES:

Divide and conquer

PROBLEM:

The score of an array equals its length minus the number of pairs of equal elements contained in it.

Given an array A, compute the maximum score across all of its subarrays.

EXPLANATION:

The obvious (and slow) brute force algorithm is to go over all subarrays and compute the score for each one.
This can be done in quadratic time, by noting that when the subarray [L, R] is extended to [L, R+1], the change in score can be computed quite easily: it exactly equals 1 - \text{freq}[A_{R+1}], where \text{freq}[A_{R+1}] is the number of occurrences of A_{R+1} in the subarray [L, R].

This brute provides a nice starting point to us, for we can make the following observation.

For the left index L, let’s define \text{opt}_L to be the right index of the subarray that has maximum score, among all subarrays starting at L.
If there are multiple right indices that maximize the score, we choose \text{opt}_L to be the largest of them.

The key observation here is that \text{opt}_L \le \text{opt}_{L+1}, that is, the “optimal right indices” form a non-decreasing sequence.

Proof

The proof is straightforward by contradiction.

Suppose \text{opt}_L \gt \text{opt}_{L+1}.
Consider the elements at indices \text{opt}_{L+1} + 1, \ldots, \text{opt}_L.

For such an index i, its “contribution” to the score of [L, \text{opt}_L] is surely not greater than its contribution to the score of [L+1, \text{opt}_{L+1}].
Here, we define the contribution of an element to be 1 minus the number of elements equal to it that lie to its left.
This allows the score of a subarray to be the sum of contributions of all its elements, since each equal pair will end up being subtracted only once.

The above observation about contribution follows immediately from the definition; after all, for any index i there can’t be more occurrences of it in [L+1, i-1] as compared to [L, i-1], the latter being a strictly larger segment.

There are now two cases:

The total contribution of indices \text{opt}_{L+1} + 1, \ldots, \text{opt}_L to [L, \text{opt}_L] is negative.
If this were the case, we could simply discard all these elements by choosing the subarray [L, \text{opt}_{L+1}] instead, to obtain a strictly higher score of a subarray starting at L.
This is impossible, since it contradicts \text{opt}_L being optimal.

The total contribution of indices \text{opt}_{L+1} + 1, \ldots, \text{opt}_L to [L, \text{opt}_L] is non-negative.
Here, note that the segment [L+1, \text{opt}_L] then surely has a score that’s not less than that of [L+1, \text{opt}_{L+1}]; since as noted above if all these elements had overall non-negative contribution with respect to L then they have non-negative contribution with respect to L+1 as well.
But then this contradicts \text{opt}_{L+1} being the rightmost optimum for L+1.

Since we reach a contradiction in either case, \text{opt}_L \gt \text{opt}_{L+1} is impossible.

This fact is very helpful, because it in fact allows us to compute all the values of \text{opt} quickly using divide-and-conquer.

Specifically, let’s define a function f(L, R, x, y) whose aim is to compute the value of \text{opt}_i for all L \le i \le R, with the knowledge that x \le \text{opt}_i \le y for all i \in [L, R].
This can be done as follows:

Let M = \text{midpoint}(L, R).
Compute the scores of all subarrays [M, j], for each x \le j \le y.
This can be done in \mathcal{O}(y - M) time, by starting with the subarray [M, M] and extending it right one step at a time till [M, y] is reached.
During this process, we also obtain the value of \text{opt}_M.
Then, recursively compute f(L, M-1, x, \text{opt}_M) and f(M+1, R, \text{opt}_M, y) to find the answers for the left and right sides of M.

Now, if implemented directly, this still is quadratic: we still end up processing every index as a left endpoint (just in a different order, due to the divide and conquer), and in the worst case where y = N always, we’ll do linear work for each one.

The bottleneck here is the part where we compute the scores of all [M, j] with x \le j \le y.
Specifically, the part where we start with [M, M] and extend right always.

To improve this a bit, note that we don’t just have the ability to extend a subarray right by one step - we can in fact extend it left by one step as well; and even shrink it by one step on the left/right.
That is, from [L, R], we can move to all four of [L+1, R], [L-1, R], [L, R-1], [L, R+1] in constant time by just storing the frequency array.

This allows us to reuse information across different nodes of the divide and conquer solution.

Let’s maintain global pointers l_0 and r_0, as well as a global frequency array \text{freq} that corresponds to the frequency of elements in [l_0, r_0].
Now, when processing f(L, R, x, y), we still want to compute the scores of all of [M, j] for j \in [x, y].
However, this time we’ll do so by moving the global pointers [l_0, r_0] appropriately till they reach [M, x] first, then repeatedly extend the right pointer till it reaches y.

While this might not seem like much of a change, it in fact improves our complexity significantly!

To understand why, let’s look at a single “level” of the divide-and-conquer tree.
Note that on this level, the entire array is partitioned into several smaller subarrays - let’s call them [l_i, r_i].
By nature of the divide and conquer algorithm, these smaller subarrays are processed from left
to right, i.e. first [l_1, r_1] is visited, then [l_2, r_2], and so on.

Let’s now look specifically at the left pointer l_0.

It takes \mathcal{O}(N) work for the left pointer to reach [l_1, r_1] initially.
It then does work proportional to the length of [l_1, r_1] to reach its midpoint.
Once the midpoint is reached, the left pointer doesn’t move till the segment is processed.
Then, recursive calls are made to its children.
Importantly, observe that while l_0 might move when these recursive calls are made, it will always remain in [l_1, r_1].
We don’t care how exactly it moves inside children calls: those movements are tied to other levels and not this one.
After the children are completely processed, l_0 will still lie in [l_1, r_1].
The next segment to be processed is [l_2, r_2], which can be reached by only increasing l_0 - hence taking \mathcal{O}(r_1 - l_1) work.
Observe that after the pointer first reached [l_1, r_1], it only did \mathcal{O}(r_1 - l_1) work in this segment, considering only the current level.
Once it reaches [l_2, r_2], the situation is the same: it will do \mathcal{O}(r_2 - l_2) work there, and end up in [l_2, r_2] after the children are processed.
Then it can move to [l_3, r_3], and so on.

In general, after reaching the segment [l_1, r_1], the left pointer proceeds to only do work proportional to the length of the segment it is in.
To reach [l_1, r_1] it must do \mathcal{O}(N) work, but this happens only once.
Since the segments partition the array, this means the pointer l_0 does \mathcal{O}(N) work over all segments when considering this level only.

The story for the right pointer r_0 is similar: because the \text{opt}_i values are increasing, we end up breaking the range [1, N] into non-overlapping segments (of what ranges the \text{opt} values can lie in, from left to right).
Similar analysis shows that r_0 ends up doing linear work on a single level too (the reason is basically the same: for each node, it does work corresponding to the length of the segment it is in; and after processing children it will remain in this segment).

So, for a fixed level, both l_0 and r_0 move \mathcal{O}(N) times.

Because our divide-and-conquer partitions based on midpoint, there are only \mathcal{O}(\log N) levels to it.
This makes the overall complexity \mathcal{O}(N\log N), which is easily fast enough.

To compute the answer, call f(1, N, 1, N), and update the answer each time one of the pointers is moved.
By nature of the solution, we’ll definitely go over the optimal segment at some point, so no extra work is needed at the end of the recursion.

TIME COMPLEXITY:

\mathcal{O}(N\log N) per testcase.

CODE:

Author's code (C++)

#include <bits/stdc++.h>
using namespace std;
#define int long long
#define INF (int)1e18

mt19937_64 RNG(chrono::steady_clock::now().time_since_epoch().count());

void Solve() 
{
    int n; cin >> n;
    
    vector <int> a(n + 1);
    for (int i = 1; i <= n; i++){
        cin >> a[i];
    }
    
    int cl = 1, cr = 1;
    int p = 0;
    vector <int> f(n + 1);
    f[a[1]]++;
    int ans = 0;
    
    auto add = [&](int x){
        p += f[x]++;  
    };
    
    auto era = [&](int x){
        p -= --f[x];  
    };
    
    auto change = [&](int l, int r){
        while (cl > l){
            cl--;
            add(a[cl]);
        }  
        while (cr < r){
            cr++;
            add(a[cr]);
        }
        while (cl < l){
            era(a[cl]);
            cl++;
        }
        while (cr > r){
            era(a[cr]);
            cr--;
        }
    };
    
    auto dnc = [&](auto self, int l, int r, int ql, int qr){
        if (l > r) return;
        int mid = (l + r) / 2;  
        
        int best = -INF, id = -1;
        for (int i = max(mid, ql); i <= qr; i++){
            change(mid, i);
            int val = (cr - cl + 1) - p;
            ans = max(ans, val);
            if (val > best){
                best = val;
                id = i;
            }
        }
        
        self(self, l, mid - 1, ql, id);
        self(self, mid + 1, r, id, qr);
    };
    
    dnc(dnc, 1, n, 1, n);
    cout << ans << "\n";
}

int32_t main() 
{
    auto begin = std::chrono::high_resolution_clock::now();
    ios_base::sync_with_stdio(0);
    cin.tie(0);
    int t = 1;
    // freopen("in",  "r", stdin);
    // freopen("out", "w", stdout);
    
    cin >> t;
    for(int i = 1; i <= t; i++) 
    {
        //cout << "Case #" << i << ": ";
        Solve();
    }
    auto end = std::chrono::high_resolution_clock::now();
    auto elapsed = std::chrono::duration_cast<std::chrono::nanoseconds>(end - begin);
    cerr << "Time measured: " << elapsed.count() * 1e-9 << " seconds.\n"; 
    return 0;
}

sujalxpro · September 11, 2025, 9:16am

https://www.codechef.com/viewsolution/1190166838

I tried an N * sqrt(N) * log(N) solution.
Were the constraints specifically designed to rule out such approaches?

My method was:

Used two pointers.
For each L, extended R until an element’s freq > sqrt(2*n).
Maintained a segtree for max over [L,R].
On moving L, removed contributions via lazy segtree on dp.
Bottleneck: removing contributions per element (≤ sqrt(N)) slowed it down.