MEANDIFF - Editorial

PROBLEM LINK:

Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4

Author: am_aadvik
Tester: raysh07
Editorialist: iceknight1093

DIFFICULTY:

Simple

PREREQUISITES:

Sorting

PROBLEM:

Define f(S) to be the maximum of |x - \text{avg}(S)| across all x \in S.
Given an array A, compute the maximum possible value of f(S) across all subsequences of A.

EXPLANATION:

For a fixed S, it’s obvious that the optimal x will be either the minimum or the maximum element of S; whichever is further away from the average.

Now, let’s look at the given array A.
Given that the minimum and maximum are important for a fixed subset, it makes sense to look at the overall minimum and maximum of A.
And indeed, it can be proved that in an optimal choice of S, both \min(A) and \max(A) will be included!

Proof

Let m = \min(A).
Suppose we choose a subset that doesn’t include m.
What we’ll do is discard the existing minimum of the subset and include m instead.

Before including m, let |S| = K and X be its average (so the sum of the set is K\cdot X).
Let m_2 be the existing minimum of S.
Note that the average is certainly \gt m, since all elements of the set are \gt m.

Let d = m_2 - m.
Upon replacing m_2 with m, the average reduces by \frac{d}{K}.
This value is certainly not larger than d.
So, the decrease in average is offset by the decrease in the minimum of the set; meaning the entire value (\text{avg}(S) - \min(S)) increases.

Of course, since the maximum element doesn’t change and the average decreases, \max(S) - \text{avg}(S) also increases.

This means including \min(A) in the subset is a net profit.

Similar reasoning applies to including \max(A) in the subset.

Let’s use this information to decide what the optimal subset should be.

We want the average of S to be either very far from \min(A) or be very far from \max(A).
Let’s try both cases separately, and take the maximum of both - that will give us the overall optimum.

So, let’s just try to maximize \text{avg}(S) - \min(A) for now.
Since \min(A) is fixed, we’re really just trying to maximize \text{avg}(S).
It’s thus optimal to choose ‘large’ elements.

However, which large elements should we choose?
Well, we can just try all possible choices!
Observe that if it’s optimal to include an element x in the set S, then it’s optimal to always include all elements larger than x as well.

Proof

Suppose x is included and y is not included, where y \ge x.

Let M = \text{avg}(S).
Now,

  • If M \le x, then including y into the set will not decrease its average.
    So, it’s optimal to include y.
  • If M \gt x, then removing x from the set will further increase its average.
    This goes against x being part of an optimal solution, so can’t happen.

Thus, if it’s optimal to include x, it’s also optimal to include y.

This means there are only N-1 possible choices for the set S if we want to maximize \text{avg}(S).

  • \min(A) and \max(A).
  • \min(A) and the two largest elements of A.
  • \min(A) and the three largest elements of A.
  • \min(A) and the four largest elements of A.
    \ldots
  • \min(A) and the N-2 largest elements of A.
  • \min(A) and the N-1 largest elements of A (which is just the entirety of A).

We can simply try all of them and compute the best possible value of \text{avg}(S).
To do this in subquadratic time: observe that we can simply sort A, after which we’re looking at suffixes of the sorted array.
We then want to compute the mean of each set formed by taking the first element along with some suffix; which can be done quite easily when the suffix sums of this sorted array are known.


This allows us to maximize the value of \text{avg}(S) - \min(A).
We had another option: maximizing \max(A) - \text{avg}(S).
This can be handled similarly, just that we want to minimize \text{avg}(S) instead (so it’s optimal to choose elements from the prefix of sorted elements instead).

Try all options and report the best one.
This takes linear time after sorting.

TIME COMPLEXITY:

\mathcal{O}(N\log N) per testcase.

CODE:

Tester's code (C++)
#include <bits/stdc++.h>
using namespace std;
#define int long long
#define INF (int)1e18

mt19937_64 RNG(chrono::steady_clock::now().time_since_epoch().count());

void Solve() 
{
    int n; cin >> n;
    
    vector <int> a(n + 1);
    for (int i = 1; i <= n; i++) cin >> a[i];
    
    sort(a.begin() + 1, a.end());
    
    int sum = 0;
    int ans = 0;
    for (int i = 1; i < n; i++){
        sum += a[i];
        
        int mean = (sum + a[n]) / (i + 1);
        ans = max(ans, a[n] - mean);
    }
    
    sum = 0;
    for (int i = n; i > 1; i--){
        sum += a[i];
        
        int mean = (sum + a[1]) / (n - i + 2);
        ans = max(ans, mean - a[1]);
    }
    
    cout << ans << "\n";
}

int32_t main() 
{
    auto begin = std::chrono::high_resolution_clock::now();
    ios_base::sync_with_stdio(0);
    cin.tie(0);
    int t = 1;
    // freopen("in",  "r", stdin);
    // freopen("out", "w", stdout);
    
    cin >> t;
    for(int i = 1; i <= t; i++) 
    {
        //cout << "Case #" << i << ": ";
        Solve();
    }
    auto end = std::chrono::high_resolution_clock::now();
    auto elapsed = std::chrono::duration_cast<std::chrono::nanoseconds>(end - begin);
    cerr << "Time measured: " << elapsed.count() * 1e-9 << " seconds.\n"; 
    return 0;
}
1 Like

Iam almost close to the solution in contest. Just one less than condition ruined everything. But loved this question.