SETMED - Editorial

PROBLEM LINK:

Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4

Author: raysh07
Tester: sushil2006
Editorialist: iceknight1093

DIFFICULTY:

Easy

PREREQUISITES:

Sorting, Prefix sums, Binary search

PROBLEM:

You’re given an array A.
You can perform the following operation at most once: choose a subsequence, and replace all of its elements by the median of the subsequence.
Find the maximum possible sum of A.

The median of a sequence is obtained by sorting it, and taking the middle element (odd length) or the left of the two middle elements (even length).

EXPLANATION:

To make things simple, let’s first sort the array A - this obviously doesn’t change the answer.
From now on, we’ll assume A_1 \le A_2 \le \ldots \le A_N

As a first observation, note that in an optimal solution, we’ll never choose an even-length subsequence.
This is because, if we choose the subsequence (x_1, x_2, \ldots, x_{2k}) with median x_k, it would instead be better to choose the subsequence (x_1, x_2, \ldots, x_{2k-1}), i.e. drop the last element to obtain an odd-length subsequence.
This is because the odd-length subsequence still has median x_{k}, so in both cases the elements x_1, \ldots, x_{2k-1} will get set to x_k. However, by dropping x_{2k} we can ensure that x_{2k} is unchanged rather than being set to x_k, which is profitable because x_{2k} \ge x_k.


Now, suppose that we’ve decided that the middle element of our chosen subsequence is going to be A_i.
Since we’re only dealing with odd-length subsequences, we must thus choose an equal number of elements to both the left and the right of index i.
There are now two things to be decided: the number of elements that must be chosen, and which ones.

Suppose we decide that we’re taking k elements on each side of index i.
Seeing that all these elements will be made equal to A_i, we can see that the “profit” of choosing element A_j with respect to the sum of the array, equals (A_i - A_j).
We want to maximize the profit, so it’s clearly better to choose smaller values of A_j.
This means it’s optimal to choose:

  • A_1, A_2, \ldots, A_k to the left of A_i.
  • A_{i+1}, A_{i+2}, \ldots, A_{i+k} to the right of A_i.

The overall change to the sum of A then equals

2k\cdot A_i - (A_1 + \ldots + A_k) - (A_{i+1} + \ldots + A_{i+k})

This is pretty easy to compute in constant time - it’s just a couple of range sums of A, so simple prefix sums will do the job.

The issue now is the choice of k - trying all of them will result in a complexity of \mathcal{O}(N) per index, and \mathcal{O}(N^2) overall, which is much too slow.


To improve the above algorithm, let’s try to find the optimal k for a fixed index i a bit quicker.

Let’s look a bit closer at the profit - or rather, how it’s computed.
Suppose we define an array p such that p_j = (A_i - A_j) + (A_i - A_{i+j}).
This is the profit of choosing the two elements at indices j and i+j, with the median being A_i.

With this definition, the overall profit when we fix k is exactly p_1 + p_2 + \ldots + p_k.
Now, note that p is a non-increasing array, i.e. p_j \ge p_{j+1} for all j.
This is because A is sorted in ascending order; so we have A_i - A_j \ge A_i - A_{j+1} and similarly A_i - A_{i+j} \ge A_i - A_{i+j+1}.

In particular, this means some prefix of p will contain non-negative values (and so are profitable to take), and then once it goes negative, it will remain negative (so we never want to take these).

This means the optimal choice of k is, quite simply, the largest index k such that p_k \ge 0.
Finding this optimal k is now quite easy: since p is non-decreasing, we can simply binary search to find it.

Once the optimal k is known, computing its profit can be done in constant time with the help of prefix sums, as noted above.

This allows us to solve for a single median A_i in logarithmic time.
So, simply trying each A_i as the median and taking the best answer among all of them leads to a solution in \mathcal{O}(N\log N), which is easily fast enough.


When binary searching to find the optimal k for i, make sure to choose the upper bound on k appropriately.
k must be \le i-1 since we need to be able to choose k elements strictly to the left of i.
k must also be \le N-i since we need to be able to choose k elements strictly to the right of i.

TIME COMPLEXITY:

\mathcal{O}(N\log N) per testcase.

CODE:

Editorialist's code (PyPy3)
for _ in range(int(input())):
    n = int(input())
    a = list(map(int, input().split()))
    a.sort()

    pre = [0]
    for x in a: pre.append(pre[-1] + x)

    best = 0
    for i in range(1, n-1):
        lo, hi = 1, min(i, n-1-i)
        while lo < hi:
            mid = (lo + hi + 1) // 2
            if a[i]-a[mid-1] >= a[i+mid]-a[i]: lo = mid
            else: hi = mid-1
        
        cur = pre[lo] + pre[i+lo+1] - pre[i+1]
        nxt = 2*lo*a[i]
        best = max(best, nxt - cur)
    print(sum(a) + best)
2 Likes

A wonderful Editorial to explain the problem!!!

1 Like

I also tried to use bs to find the best sum for each number ranging from 1 to n-1 but i am getting incorrect answer after passing only 2 hidden test case

can you tell me why sir ??

include <bits/stdc++.h>
using namespace std;
typedef long long int lli;

lli fun(lli mini,vector& pref,vector& arr,lli ind)
{
lli s=0,e=mini-1;
lli sum=pref[pref.size()-1];
lli ans=0;
while(s<=e)
{
lli mid=(s+e)/2;
lli leftsum=(mid+1)*arr[ind]-pref[mid];
lli rightsum=(pref[mid+ind+1]-pref[ind])-(mid+1)*arr[ind];
if(rightsum>=leftsum)
{
e=mid-1;
}
else{ ans=max(ans,sum+leftsum-rightsum);s=mid+1;}
}
return ans;
}

int main() {
// your code goes here
int t;
cin>>t;
while(t–)
{
int n;
cin>>n;
vector arr(n);
lli sum=0;
for(int i=0;i<n;i++){cin>>arr[i];sum+=arr[i];}
sort(arr.begin(),arr.end());
vector pref(n);
pref[0]=arr[0];
for(int i=1;i<n;i++)pref[i]=arr[i]+pref[i-1];
for(int i=1;i<n-1;i++)
{
lli mini=min(i,n-i-1);
sum=max(sum,fun(mini,pref,arr,i));
}

    cout<<sum<<endl;
}

}

Sample Input
1
5
2 10 5 8 11
Your Output
43. This is answer of your accepted solution(editorial).actual answer would be to set 10 as median, change 2 and 5 to be 10 , increase is +13, original sum =36, so answer should be 49, but by sorting either you underestimated the profit or overestimated.

Which operation are you performing that allows you to set 2 and 5 to 10, without changing any other element?

Note that the operation allowed is “choose a subsequence, and set all the elements of this subsequence to the median of this subsequence”.

If you mean to choose the subsequence [2, 10, 5], note that its median is 5 and not 10, because (as is standard definition, and is also mentioned in the statement), the median is the middle element after sorting.