PROBLEM LINK:
Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4
Authors: sushil2006, shyamer3, nilayan17
Tester: sushil2006
Editorialist: iceknight1093
DIFFICULTY:
Medium
PREREQUISITES:
Divide and conquer, fenwick/segment trees
PROBLEM:
Solve the problem for each prefix of A.
EXPLANATION:
Recall from the solution to the easy version that for a fixed array A, the answer was the maximum of k^2 + S_{N-k} across all k \in [0, N], where S_i denotes the sum of the largest i elements of A.
We now need to figure out a way to adapt this to solving for every prefix.
Note that S_{N-k} equals the sum of the entire array, minus the sum of the smallest k elements.
Let m_k denote the sum of the smallest k elements of the array.
With this, the answer instead becomes T + \max_{k=0}^N(k^2 - m_k), with T being the sum of the whole array.
T is a constant, so we want to find the largest value of k^2 - m_k instead.
Let’s define f_i(k) to be the value of k^2 - m_k when looking at the prefix of length i.
Let \text{opt}_i denote the value of k that maximizes f_i(k) (if there are multiple, choose the largest k among them).
Note that if we know all the \text{opt}_i values, computing the actual answers is not too hard - so we focus on computing \text{opt}_i.
It turns out that these values have a pretty simple structure: they’re non-decreasing, i.e, \text{opt}_i \leq \text{opt}_{i+1}.
With this observation, we can now compute all the values of \text{opt}_i using divide and conquer.
Details
Let f(L, R, x, y) be a recursive function that computes \text{opt}_i for all i \in [L, R], while knowing that their values are in [x, y].
Let M = \text{midpoint}(L, R). We’ll first compute \text{opt}_M.
This can be done naively: for each k \in [x, y], compute f_M(k) and then take the best among them.
For now, suppose we’re able to do this reasonably quickly somehow.Then, to compute the values of everything else in the range, recursively call f(L, M-1, x, \text{opt}_M) and f(M+1, R, \text{opt}_M, y) - knowing \text{opt}_M allows us to tighten the bounds on the left/right.
Note that on each ‘level’ of the recursion, each value in [1, N] is tested at most twice as a candidate k because of the way we split bounds.
So, there are \mathcal{O}(N) checks per level.
The recursion has only \mathcal{O}(\log N) levels because we repeatedly choose the midpoint to partition the range; so there are \mathcal{O}(N\log N) checks across all levels.This leaves only the question of how to actually perform each check quickly, which turns into a data structure task.
More details
Each check essentially requires us to compute f_i(k) for a fixed i and k, meaning that we need to be able to quickly find the sum of the k smallest elements in a prefix.
On a static array, this problem isn’t too hard to solve: for example, store prefix sums of the (sorted) array and just look them up.
If the array has modifications, a segment tree/fenwick tree built on values can be used instead: this allows for quickly finding the k-th smallest element, as well as finding the sum of all elements upto the k-th (which is now a range sum); and updates are just point updates.Now, in our case we have to query different prefixes. However, as it turns out, this can be done ‘naively’, in similar fashion to dealing with an array with modifications.
That is, let’s store a “global” segtree/fenwick tree built on values (to quickly compute their prefix frequencies and sums).Suppose we’ve processed the prefix till i_1, then recurse and need to process the prefix till i_2.
The ‘naive’ way of doing this is to just perform |i_1 - i_2| operations: insert all elements from i_1+1 to i_2, or delete all elements from i_1 to i_2+1, depending on whether i_1 \leq i_2 or not.
As it turns out, due to the nature of our recursion, this ‘naive’ process is fast enough!Let T(n) denote the work done for an array of size n.
We first process index \frac{N}{2} which needs O(N) operations, then we recurse into one side (say left) and compute it fully, and then recurse right and compute it fully.
For the left recursion, we do another T(\frac{N}{2}) work overall. The same applies to the right side; and on top of that, we do another O(N) work when transitioning from the left to the right (which happens only once).
So, the overall work done is T(N) = O(N) + 2T(\frac{N}{2}), which is famously O(N\log N).
Each operation is logarithmic so this is good enough.
TIME COMPLEXITY:
\mathcal{O}(N\log^2 N) per testcase.
CODE:
Author's code (C++)
#include <bits/stdc++.h>
#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/tree_policy.hpp>
using namespace std;
using namespace __gnu_pbds;
template<typename T> using Tree = tree<T, null_type, less<T>, rb_tree_tag, tree_order_statistics_node_update>;
typedef long long int ll;
typedef long double ld;
typedef pair<int,int> pii;
typedef pair<ll,ll> pll;
#define fastio ios_base::sync_with_stdio(false); cin.tie(NULL)
#define pb push_back
#define endl '\n'
#define sz(a) (int)a.size()
#define setbits(x) __builtin_popcountll(x)
#define ff first
#define ss second
#define conts continue
#define ceil2(x,y) ((x+y-1)/(y))
#define all(a) a.begin(), a.end()
#define rall(a) a.rbegin(), a.rend()
#define yes cout << "Yes" << endl
#define no cout << "No" << endl
#define rep(i,n) for(int i = 0; i < n; ++i)
#define rep1(i,n) for(int i = 1; i <= n; ++i)
#define rev(i,s,e) for(int i = s; i >= e; --i)
#define trav(i,a) for(auto &i : a)
template<typename T>
void amin(T &a, T b) {
a = min(a,b);
}
template<typename T>
void amax(T &a, T b) {
a = max(a,b);
}
#ifdef LOCAL
#include "debug.h"
#else
#define debug(...) 42
#endif
/*
*/
const int MOD = 1e9 + 7;
const int N = 1e5 + 5;
const int inf1 = int(1e9) + 5;
const ll inf2 = ll(1e18) + 5;
template<typename T>
struct fenwick {
int n;
vector<T> tr;
int LOG = 0;
fenwick() {
}
fenwick(int n_) {
n = n_;
tr = vector<T>(n + 1);
while((1<<LOG) <= n) LOG++;
}
int lsb(int x) {
return x & -x;
}
void pupd(int i, T v) {
for(; i <= n; i += lsb(i)){
tr[i] += v;
}
}
T sum(int i) {
T res = 0;
for(; i; i ^= lsb(i)){
res += tr[i];
}
return res;
}
T query(int l, int r) {
if (l > r) return 0;
T res = sum(r) - sum(l - 1);
return res;
}
int lower_bound(T s){
// first pos with sum >= s
if(sum(n) < s) return n+1;
int i = 0;
rev(bit,LOG-1,0){
int j = i+(1<<bit);
if(j > n) conts;
if(tr[j] < s){
s -= tr[j];
i = j;
}
}
return i+1;
}
int upper_bound(T s){
return lower_bound(s+1);
}
};
void solve(int test_case){
ll n; cin >> n;
vector<ll> a(n+5);
rep1(i,n) cin >> a[i];
vector<ll> b;
rep1(i,n) b.pb(a[i]);
b.pb(-inf2);
sort(all(b));
b.resize(unique(all(b))-b.begin());
vector<ll> ca(n+5);
rep1(i,n) ca[i] = lower_bound(all(b),a[i])-b.begin();
vector<ll> ans(n+5);
fenwick<ll> fenw_cnt(n+5), fenw_sum(n+5);
ll ptr = 0; // max added guy
auto get = [&](ll i, ll j){
// considering a[1..i], what is the val of j^2-(sum of j smallest)
// expand
while(ptr < i){
ptr++;
fenw_cnt.pupd(ca[ptr],1);
fenw_sum.pupd(ca[ptr],a[ptr]);
}
// contract
while(ptr > i){
fenw_cnt.pupd(ca[ptr],-1);
fenw_sum.pupd(ca[ptr],-a[ptr]);
ptr--;
}
// find first pos where cnt >= j
ll p = fenw_cnt.lower_bound(j);
ll cnt = fenw_cnt.sum(p);
ll sum = fenw_sum.sum(p);
assert(cnt >= j);
sum -= (cnt-j)*b[p];
return j*j-sum;
};
vector<ll> opt_val(n+5);
auto go = [&](ll l, ll r, ll optl, ll optr, auto &&go) -> void{
if(l > r) return;
ll mid = (l+r)>>1;
ll best = -inf2, opt_mid = -1;
for(int j = optl; j <= optr; ++j){
if(j > mid) break;
ll val = get(mid,j);
if(val > best){
best = val;
opt_mid = j;
}
}
ans[mid] = best;
opt_val[mid] = opt_mid;
go(l,mid-1,optl,opt_mid,go);
go(mid+1,r,opt_mid,optr,go);
};
go(1,n,0,n,go);
ll s = 0;
rep1(i,n){
s += a[i];
ans[i] += s;
}
rep1(i,n) cout << ans[i] << " ";
cout << endl;
}
int main()
{
fastio;
int t = 1;
cin >> t;
rep1(i, t) {
solve(i);
}
return 0;
}