SPEEDRUN3 - Editorial

iceknight1093 · July 13, 2025, 5:30pm

PROBLEM LINK:

Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4

Author:
Tester: sushil2006
Editorialist: iceknight1093

DIFFICULTY:

Medium

PREREQUISITES:

Stacks

PROBLEM:

There’s a videogame with N levels. Level i needs at least A_i armor points to be cleared.
Define f(i, j, K) as follows:

Start at level i with 0 armor points.
You can spend K seconds to increase your armor points by 1, or 0 seconds to decrease it by 1.
You can choose to complete a level once you have its requisite armor points (or more).
Completing a level when you have y armor points requires y seconds.
Completing level i puts you at level i+1, then i+2, and so on till j is completed.
f(i, j, K) is the minimum time required to complete level j, starting at i.

You are given Q queries.
For each query, you are given a parameter K.
Compute the value

\sum_{i=1}^N \sum_{j=i}^N f(i, j, K)

N, K, Q \leq 2\cdot 10^5.

EXPLANATION:

First, a recap of the solution to a single query.

We created (weighted) edges of the form (i, \text{nxt}[i]) and (\text{prv}[i], i) where \text{nxt} and \text{prv} are the next/previous larger element for each index.
These edges were then processed in ascending order, and when dealing with edge (l, r), we overwrote the entire existing path l \to r if it was profitable to do so.
The correctness of this follows from the fact that the edges don’t intersect each other non-trivially.

We now have to deal with queries for different values of K.
Since K \leq N and Q can be as large as N, we can treat this as just having to solve for all K = 1, 2, 3, \ldots, N.

Looking back at our algorithm, what actually changes with K?
The edges themselves remain the same, since they’re based on just the array A.
The only real change is in the weight of the edges - so let’s look at what these weights exactly are.

For the edge (l, r), we have (for a fixed K):

(r-l-1) \cdot \min(A_l, A_r) for the indices between l and r.
A_r for index r itself.
If A_l \lt A_r, an additional K\cdot (A_r - A_l) factor.

In any case, note that the overall cost looks like K\cdot x + y for some non-negative constants x and y.
Parameterized by K, this is a line.

In particular, note that this means the cost itself is also always a linear equation in K, since we’re just adding up several linear equations.

For each edge (l, r), the only decision we need to make is whether to replace the existing path l \to x_1 \to\ldots\to x_k \to r by the edge l\to r.
Our next goal is figuring out exactly which values of K this needs to be done for.

Here, observe that the replacement will always be done for some “large enough” K, i.e. there will exist a constant K_0 such that for all K \geq K_0, it’s optimal to replace; and for all K \lt K_0 it’s not optimal.
To see why:

If A_l \geq A_r, the cost of the edge will be a constant.
Meanwhile, the cost of the internal path will always be linear in K, as noted above.
Since the coefficients are positive, for large enough K it will be better to always take the constant.
If A_l \lt A_r, the cost of the edge will be linear, with slope (A_r - A_l).
Any internal path must end with a line with slope (A_r - A_x) for some l \lt x \lt r.
However, we know that A_x \lt A_l for all x between l and r, so this is a strictly larger slope; and since the overall equation just adds lines, the final slope will be larger than A_r - A_l too.
So, for a large enough value, the line with slope A_r - A_l will be cheaper.

Further, it can be seen that the value of K_0 for an edge (l, r) will be larger than the values of K_0 for all edges contained within (l, r).

With the above knowledge, we’re able to do the following: process edges in ascending order of length, as before.
When processing (l, r), consider the existing path l \to \ldots\to r.
This path will give some linear equation, when added up.
From this equation, compute the value of K_0 for the edge (l, r) (which is basically just finding the x-coordinate of the intersection of the two lines; alternately you can binary search) - then replace the path l \to\ldots\to r with the edge l\to r.

This way, for each edge we obtain two events: a starting time (K_0) and an ending time (the K_0 of a later edge).
The edge (l, r) will thus contribute its K\cdot x + y value for all K in some range, to all segments that contain l and r.

This allows us to store, for each K, a pair (x_K, y_K) of values, denoting the answer for K is K\cdot x_K + y_K.
Each edge will then correspond to one range addition update on the x_K array and one range addition on the y_K array.
This can be handled using difference arrays and prefix sums (since the updates are before any queries), or a segment tree.

Once all the (x_K, y_K) values are known, answering a single query can be done in constant time.
With proper implementation everywhere, this is linear time, but additional log factors will also comfortably pass if you choose to implement that way.

TIME COMPLEXITY:

\mathcal{O}(N + Q) per testcase.

CODE:

Editorialist's code (C++)

// #include <bits/allocator.h>
// #pragma GCC optimize("O3,unroll-loops")
// #pragma GCC target("avx2,bmi,bmi2,lzcnt,popcnt")
#include "bits/stdc++.h"
using namespace std;
using ll = long long int;
mt19937_64 RNG(chrono::high_resolution_clock::now().time_since_epoch().count());

int main()
{
    ios::sync_with_stdio(false); cin.tie(0);

    int t; cin >> t;
    while (t--) {
        int n, q; cin >> n >> q;
        vector a(n, 0);
        for (int &x : a) cin >> x;

        const int mod = 1e9 + 7;
        vector nxt(n, n), prv(n, -1);
        vector transitions(n, vector<int>());

        {
            stack<int> st;
            for (int i = 0; i < n; ++i) {
                while (!st.empty()) {
                    if (a[i] > a[st.top()]) st.pop();
                    else break;
                }
                if (!st.empty()) prv[i] = st.top();
                st.push(i);
            }
        }

        {
            stack<int> st;
            for (int i = n-1; i >= 0; --i) {
                while (!st.empty()) {
                    if (a[i] > a[st.top()]) st.pop();
                    else break;
                }
                if (!st.empty()) nxt[i] = st.top();
                st.push(i);
            }
        }
        
        vector segments(n+1, vector<array<int, 2>>());
        for (int i = 0; i < n; ++i) {
            if (nxt[i] < n) segments[nxt[i]-i].push_back({i, nxt[i]});
            if (prv[i] >= 0) segments[i-prv[i]].push_back({prv[i], i});
        }
        
        using u128 = __uint128_t;
        vector link(n, -1ll), curk(n, 0ll);
        vector<array<u128, 2>> val(n), pref(n+1);
        for (int x = 1; x <= n; ++x) {
            for (auto [l, r] : segments[x]) {
                u128 c = 1ll*(r-l-1)*min(a[l], a[r]) + a[r], m = 0;
                if (a[l] < a[r]) m = (a[r] - a[l]);
                u128 ct = 1ll*(l+1)*(n-r);
                
                if (l+1 == r) {
                    if (link[l] >= 0) continue;
                    pref[1][0] += m*ct;
                    pref[1][1] += c*ct;
                    val[l] = {m, c};
                    link[l] = r;
                    curk[l] = 1;
                }
                else {
                    u128 cm = 0, cc = 0;
                    int p = l;
                    while (p < r) {
                        cm += val[p][0], cc += val[p][1];
                        p = link[p];
                    }

                    // Find smallest k in [1, n] which makes (m, c) better
                    int lo = 1, hi = n+1;
                    while (lo < hi) {
                        int mid = (lo + hi) / 2;
                        if (m*mid + c <= cm*mid + cc) hi = mid;
                        else lo = mid + 1;
                    }

                    p = l;
                    while (p < r) {
                        pref[lo][0] -= val[p][0]*ct;
                        pref[lo][1] -= val[p][1]*ct;
                        p = link[p];
                    }

                    link[l] = r;
                    val[l] = {m, c};
                    pref[lo][0] += m*ct;
                    pref[lo][1] += c*ct;
                }

            }
        }

        // Also add sum (n-i)*(a[i]*k + a[i]) across all i, for starting points
        for (int i = 0; i < n; ++i) {
            pref[1][0] += 1ll*(n-i)*a[i];
            pref[1][1] += 1ll*(n-i)*a[i];
        }
        
        for (int i = 2; i <= n; ++i) {
            pref[i][0] += pref[i-1][0];
            pref[i][1] += pref[i-1][1];
        }


        while (q--) {
            int k; cin >> k;
            u128 ans = pref[k][0]*k + pref[k][1];
            ans %= mod;
            cout << int(ans) << '\n';
        }
    }
}

Author's code (C++)

#include <bits/stdc++.h>
using namespace std;
#define int long long
#define INF (int)1e18

mt19937_64 RNG(chrono::steady_clock::now().time_since_epoch().count());

struct ufds{
    vector <int> root, sz, L, R;
    int n;
 
    void init(int nn){
        n = nn;
        root.resize(n + 1);
        sz.resize(n + 1, 1);
        L.resize(n + 1);
        R.resize(n + 1);
        for (int i = 1; i <= n; i++) L[i] = R[i] = i;
        for (int i = 1; i <= n; i++) root[i] = i;
    }
 
    int find(int x){
        if (root[x] == x) return x;
        return root[x] = find(root[x]);
    }
 
    bool unite(int x, int y){
        x = find(x); y = find(y);
        if (x == y) return false;
 
        if (sz[y] > sz[x]) swap(x, y);
        sz[x] += sz[y];
        root[y] = x;
        L[x] = min(L[x], L[y]);
        R[x] = max(R[x], R[y]);
        return true;
    }
};

const int mod = 1e9 + 7;

void Solve() 
{
    // skip the segment of 0s at the start
    // for the one-segment, you pay a cost of k to increase, and then another cost of x obviously
    // for the zero-segment, you pay a cost of either k or x => min(k, x) 
    // if the zero segment is on the edges => no cost 
    
    int n, q; cin >> n >> q;
    
    vector <int> a(n + 1);
    for (int i = 1; i <= n; i++){
        cin >> a[i];
    }
    
    int sum_cost = 0;
    for (int i = 1; i <= n; i++){
        int part = i * (n - i + 1);
        sum_cost += part * a[i];
        sum_cost %= mod;
    }
    
    int inc_cost = (n * n * (n + 1) / 2) % mod;
    
    vector<vector<int>> pos(n + 1);
    for (int i = 1; i <= n; i++){
        pos[a[i]].push_back(i);
    }
    
    vector <int> zero_cost(n + 1);
    
    auto compute_cost = [&](int l, int r, int x){
        // max(a[L], ..., a[R]) = x 
        if (l == 1 || r == n) return;
        int num = min(a[l - 1], a[r + 1]) - x;
        int part = (l - 1) * (n - r);
        
        zero_cost[r - l + 1] += num * part % mod;
        zero_cost[r - l + 1] %= mod;
    };
    
    ufds uf;
    uf.init(n + 1);
    
    int have = 0;
    
    vector <bool> alive(n + 2, false);
    for (int i = 1; i <= n; i++){
        inc_cost -= have;
        
        for (auto x : pos[i]){
            have += 1;
            
            if (alive[x - 1]){
                have += uf.sz[uf.find(x - 1)] * uf.sz[uf.find(x)];
                uf.unite(x - 1, x);
            }
            if (alive[x + 1]){
                have += uf.sz[uf.find(x)] * uf.sz[uf.find(x + 1)];
                uf.unite(x, x + 1);
            }
            alive[x] = true;
            
            compute_cost(uf.L[uf.find(x)], uf.R[uf.find(x)], i);
        }
    }
    inc_cost %= mod;
    if (inc_cost < 0) inc_cost += mod;
    
    vector <int> ans(n + 1);
    
    // for k = x, for < x, we want sum of zero_cost[x] * x, for greater, we want sum zero_cost * k 
    int sum_ge = 0, sum_le = 0;
    for (int i = 1; i <= n; i++){
        sum_ge += zero_cost[i];
        sum_ge %= mod;
    }
    
    for (int i = 1; i <= n; i++){
        sum_ge -= zero_cost[i];
        if (sum_ge < 0) sum_ge += mod;
        
        sum_le += zero_cost[i] * i;
        sum_le %= mod;
        
        ans[i] = sum_ge * i + sum_le + inc_cost * i + sum_cost;
        ans[i] %= mod;
    }
    
    while (q--){
        int k; cin >> k;
        cout << ans[k] << "\n";
    }
}

int32_t main() 
{
    auto begin = std::chrono::high_resolution_clock::now();
    ios_base::sync_with_stdio(0);
    cin.tie(0);
    int t = 1;
    // freopen("in",  "r", stdin);
    // freopen("out", "w", stdout);
    
    cin >> t;
    for(int i = 1; i <= t; i++) 
    {
        //cout << "Case #" << i << ": ";
        Solve();
    }
    auto end = std::chrono::high_resolution_clock::now();
    auto elapsed = std::chrono::duration_cast<std::chrono::nanoseconds>(end - begin);
    cerr << "Time measured: " << elapsed.count() * 1e-9 << " seconds.\n"; 
    return 0;
}

Tester's code (C++)

#include <bits/stdc++.h>
#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/tree_policy.hpp>

using namespace std;
using namespace __gnu_pbds;

template<typename T> using Tree = tree<T, null_type, less<T>, rb_tree_tag, tree_order_statistics_node_update>;
typedef long long int ll;
typedef long double ld;
typedef pair<int,int> pii;
typedef pair<ll,ll> pll;

#define fastio ios_base::sync_with_stdio(false); cin.tie(NULL)
#define pb push_back
#define endl '\n'
#define sz(a) (int)a.size()
#define setbits(x) __builtin_popcountll(x)
#define ff first
#define ss second
#define conts continue
#define ceil2(x,y) ((x+y-1)/(y))
#define all(a) a.begin(), a.end()
#define rall(a) a.rbegin(), a.rend()
#define yes cout << "Yes" << endl
#define no cout << "No" << endl

#define rep(i,n) for(int i = 0; i < n; ++i)
#define rep1(i,n) for(int i = 1; i <= n; ++i)
#define rev(i,s,e) for(int i = s; i >= e; --i)
#define trav(i,a) for(auto &i : a)

template<typename T>
void amin(T &a, T b) {
    a = min(a,b);
}

template<typename T>
void amax(T &a, T b) {
    a = max(a,b);
}

#ifdef LOCAL
#include "debug.h"
#else
#define debug(...) 42
#endif

/*



*/

const int MOD = 1e9 + 7;
const int N = 1e5 + 5;
const int inf1 = int(1e9) + 5;
const ll inf2 = ll(1e18) + 5;

void solve(int test_case){
    ll n,q; cin >> n >> q;
    vector<ll> a(n+5);
    rep1(i,n) cin >> a[i];
    
    vector<ll> here[n+5];
    rep1(i,n) here[a[i]].pb(i);

    set<ll> st;
    vector<ll> d1(n+5), d2(n+5);

    auto upd_pair = [&](ll i, ll j, ll x, ll c){
        // c = 1: insertion, c = -1: deletion
        assert(j >= i);
        ll d = j-i;

        if(i != -1){
            d1[d] += i*(n-j+1)%MOD*(j-i-1)%MOD*x%MOD*c;
            d1[d] = (d1[d]%MOD+MOD)%MOD;
        }

        if(i == -1){
            d2[0] += j*(n-j+1)*x*c;
            d2[0] = (d2[0]%MOD+MOD)%MOD;
        }
        else{
            d2[0] += j*(n-j+1)*x*c;
            d2[d] -= j*(n-j+1)*x*c;
            d2[d] += (j-i)*(n-j+1)*x*c;
            
            d2[0] = (d2[0]%MOD+MOD)%MOD;
            d2[d] = (d2[d]%MOD+MOD)%MOD;
        }
    };

    rev(x,n,1){
        // all here[x] will turn from 0 to 1
        trav(i,here[x]){
            // activate i
            d1[0] += i*(n-i+1)*x;
            d1[0] %= MOD;

            auto it = st.upper_bound(i);
            ll l = -1, r = -1;

            if(it != st.begin()){
                l = *prev(it);
            }
            if(it != st.end()){
                r = *it;
            }

            // remove old pairs
            if(r != -1){
                upd_pair(l,r,x,-1);
            }

            // insert new pairs
            upd_pair(l,i,x,1);
            
            if(r != -1){
                upd_pair(i,r,x,1);
            }

            st.insert(i);
        }
    }

    rep1(i,n){
        d1[i] += d1[i-1];
        d2[i] += d2[i-1];
        d1[i] %= MOD, d2[i] %= MOD;
    }

    vector<ll> ans(n+5);
    rep1(k,n){
        ans[k] = d1[k]+d2[k]*k;
        ans[k] %= MOD;
    }

    while(q--){
        ll k; cin >> k;
        cout << ans[k] << endl;
    }

    /*

    vector<ll> ans(n+5);

    rep1(k,n){
        rep1(x,n){
            auto oa = a;
            rep1(i,n){
                if(a[i] >= x) a[i] = 1;
                else a[i] = 0;
            }

            ll sum = 0;
            ll prev_one = -1;

            rep1(i,n){
                if(a[i] == 1){
                    sum += i*(n-i+1);
                    if(prev_one != -1 and i-prev_one <= k){
                        sum += (i-prev_one)*(n-i+1)*k;
                    }
                    else{
                        sum += i*(n-i+1)*k;
                    }

                    if(prev_one != -1){
                        if(i-prev_one <= k){
                            sum += prev_one*(n-i+1)*(i-prev_one-1);
                        }
                    }
                    
                    prev_one = i;
                }
            }

            ans[k] += sum;
            ans[k] %= MOD;

            a = oa;
        }
    }

    */
}

int main()
{
    fastio;

    int t = 1;
    cin >> t;

    rep1(i, t) {
        solve(i);
    }

    return 0;
}