SPEEDRUN2 - Editorial

PROBLEM LINK:

Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4

Author:
Tester: sushil2006
Editorialist: iceknight1093

DIFFICULTY:

Easy

PREREQUISITES:

Stacks

PROBLEM:

There’s a videogame with N levels. Level i needs at least A_i armor points to be cleared.
Define f(i, j, K) as follows:

  • Start at level i with 0 armor points.
  • You can spend K seconds to increase your armor points by 1, or 0 seconds to decrease it by 1.
  • You can choose to complete a level once you have its requisite armor points (or more).
    Completing a level when you have y armor points requires y seconds.
  • Completing level i puts you at level i+1, then i+2, and so on till j is completed.
  • f(i, j, K) is the minimum time required to complete level j, starting at i.

You are given a parameter K.
Compute the value

\sum_{i=1}^N \sum_{j=i}^N f(i, j, K)

N \leq 2\cdot 10^5 here.

EXPLANATION:

First, we recap the solution to the easy version.

To solve for a single array A in linear time, we called a level good if it was completed while having exactly A_i armor points, then defined dp_i to be the minimum cost of completing till the i-th level while also ensuring level i is good.
dp_i only needs to be updated from dp_j for all j \lt i for which either j = \text{prv}[i] or \text{nxt}[j] = i, and there are only \mathcal{O}(N) pairs of (i, j) across the entire array.
Here, \text{nxt}[i] is the smallest index \gt i containing an element larger than A_i, and \text{prv}[i] is defined similarly but for the left.

This \mathcal{O}(N) DP was then applied to each suffix of the array A, for \mathcal{O}(N^2).


We can no longer apply the DP to each suffix of the array separately.
Let’s take a closer look at what’s going on.

For our DP, we have several weighted “edges”, each of the form (i, \text{nxt}[i]) or (\text{prv}[i], i).
We’re essentially using these edges to compute a shortest path from 1 to N (or 1 to every other vertex, really).

One important observation here is that the edges don’t intersect each other non-trivially.
That is, if we have two edges (l_1, r_1) and (l_2, r_2), with l_1 \leq l_2, we’ll have either r_1 \leq l_2 or r_2 \leq r_1.
That is, when looking at the edges as segments, each pair of segments will either be disjoint, or one will be contained inside the other (at best, they can touch at an endpoint).

It’s not hard to see why: for example if we look at the edge (i, \text{nxt}[i]), every element between them cannot have a \text{nxt}[j] value that’s \gt \text{nxt}[i], since \text{nxt}[i] already contains an element larger than all of them.
Similarly, no j \lt i can have i \lt \text{nxt}[j] \lt \text{nxt}[i], since A_i is larger than everything in [i+1, \text{nxt}[i]-1] already.
Similar logic can be applied to the (\text{prv}[j], j) edges to see that they won’t intersect [i, \text{nxt}[i]] either.


The above observation about edges not intersecting allows us to run a more “global” algorithm to compute the minimum cost, as follows.

Let’s process all the edges (l, r) in ascending order of their size.
When processing the edge (l, r), we’ll already have some path of the form
l \to x_1 \to x_2 \to \ldots \to r
which has some cost.

If the cost of this path is greater than the cost of the edge (l, r), we can simply replace the entire path by the edge l \to r.
This is only correct because of the non-intersecting property of edges: since all smaller edges have been processed already, we’re guaranteed that no future edge can ever involve a vertex in [l+1, r-1] which is why this replacement is valid.

In the end, we’ll end up with some path from 1 to N, and the cost of this path is exactly the answer we want.


The nice thing about the above algorithm is that it extends to summing over all subarrays without too much effort.

Once again, let’s process the edges in ascending order of their length.
When we’re processing (l, r), some path l \to \ldots \to x_i \to \ldots r already exists, just as before.
If this edge is cheaper than the path, once again we replace the path with the edge l \to r.
However, now note that this replacement will be something we do when solving for any subarray [L, R] such that L \leq l \lt r \leq R, i.e for any subarray that contains this edge.

So, all we need to do is multiply the magnitude of the change by the number of subarrays containing the edge!

This entire thing can be implemented in linear time:

  • Computing \text{nxt}[i] and \text{prv}[i] are done in linear time using a stack, so we have all the edges we need.
  • Sorting the edges by length can be done in linear time since all lengths are \leq N.
    For example, store N lists of edges and push each edge to the list corresponding to its length, then process these lists in order.
  • When updating the edge l\to r (and hence deleting an existing path), one way is to store the existing path as a linked list, as in storing the next vertex on the path and such.
    This allows for easy deletion/insertion by just updating a couple of values; and takes linear time overall since each edge can be inserted/deleted at most once.

Of course, there are several ways to do this in \mathcal{O}(N\log N) as well, which is perfectly acceptable too.


With the higher constraints, the answer being required modulo 10^9 + 7 is actually relevant, since N^4 no longer fits into the limit of a 64-bit integer.

Since we’re attempting to take the minimum of certain values during our algorithm, we can’t actually store everything reduced under mod - the comparisons will be thrown off.
However, it can be seen that what we take the minimum of are only the edge costs (and the sums of these edge costs).
These values are bounded by \mathcal{O}(N^3) instead (at worst, we add up N edge costs each of value \mathcal{O}(N^2)), which does fit into a 64-bit integer.
So, it’s entirely possible to perform the algorithm normally, and reduce under mod only when they’re being added to the answer (and some multiplications are needed).

Alternately, 128-bit integers can be used.

TIME COMPLEXITY:

\mathcal{O}(N) or \mathcal{O}(N\log N) per testcase.

CODE:

Editorialist's code (C++)
// #include <bits/allocator.h>
// #pragma GCC optimize("O3,unroll-loops")
// #pragma GCC target("avx2,bmi,bmi2,lzcnt,popcnt")
#include "bits/stdc++.h"
using namespace std;
using ll = long long int;
mt19937_64 RNG(chrono::high_resolution_clock::now().time_since_epoch().count());

// https://github.com/kth-competitive-programming/kactl/blob/main/content/data-structures/FenwickTree.h
struct FT {
    vector<ll> s;
    FT(int n) : s(n) {}
    void update(int pos, ll dif) { // a[pos] += dif
        for (; pos < ssize(s); pos |= pos + 1) s[pos] += dif;
    }
    ll query(int pos) { // sum of values in [0, pos)
        ll res = 0;
        for (; pos > 0; pos &= pos - 1) res += s[pos-1];
        return res;
    }
};

int main()
{
    ios::sync_with_stdio(false); cin.tie(0);

    int t; cin >> t;
    while (t--) {
        int n, q; cin >> n >> q;
        vector a(n, 0);
        for (int &x : a) cin >> x;
        int k; cin >> k;

        const int mod = 1e9 + 7;
        vector nxt(n, n), prv(n, -1);
        vector transitions(n, vector<int>());

        {
            stack<int> st;
            for (int i = 0; i < n; ++i) {
                while (!st.empty()) {
                    if (a[i] > a[st.top()]) st.pop();
                    else break;
                }
                if (!st.empty()) prv[i] = st.top();
                st.push(i);
            }
        }

        {
            stack<int> st;
            for (int i = n-1; i >= 0; --i) {
                while (!st.empty()) {
                    if (a[i] > a[st.top()]) st.pop();
                    else break;
                }
                if (!st.empty()) nxt[i] = st.top();
                st.push(i);
            }
        }
        
        vector segments(n+1, vector<array<int, 2>>());
        for (int i = 0; i < n; ++i) {
            if (nxt[i] < n) segments[nxt[i]-i].push_back({i, nxt[i]});
            if (prv[i] >= 0) segments[i-prv[i]].push_back({prv[i], i});
        }
        
        ll ans = 0;
        FT fen(n);
        vector link(n, -1ll), val(n, 0ll);
        for (int x = 1; x <= n; ++x) {
            for (auto [l, r] : segments[x]) {
                ll cost = 1ll*(r-l-1)*min(a[l], a[r]) + a[r];
                if (a[l] < a[r]) cost += 1ll*k*(a[r] - a[l]);
                ll ct = 1ll*(l+1)*(n-r) % mod;
                
                if (l+1 == r) {
                    if (link[l] >= 0) continue;
                    ans += ct*(cost % mod);
                    ans %= mod;
                    link[l] = r;
                    val[l] = cost;
                    fen.update(l, cost);
                }
                else {
                    ll cur = fen.query(r) - fen.query(l);
                    if (cost < cur) {
                        ans += ct*((cost - cur) % mod);
                        ans %= mod;
                        
                        int p = l;
                        while (p < r) {
                            fen.update(p, -val[p]);
                            p = link[p];
                        }
                        link[l] = r;
                        val[l] = cost;
                        fen.update(l, cost);
                    }
                }

            }
        }

        for (int i = 0; i < n; ++i) {
            ans += 1ll*(n-i)*(1ll*k*a[i] + a[i]);
            ans %= mod;
        }

        cout << ans % mod << '\n';
    }
}
Tester's code (C++)
#include <bits/stdc++.h>
#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/tree_policy.hpp>

using namespace std;
using namespace __gnu_pbds;

template<typename T> using Tree = tree<T, null_type, less<T>, rb_tree_tag, tree_order_statistics_node_update>;
typedef long long int ll;
typedef long double ld;
typedef pair<int,int> pii;
typedef pair<ll,ll> pll;

#define fastio ios_base::sync_with_stdio(false); cin.tie(NULL)
#define pb push_back
#define endl '\n'
#define sz(a) (int)a.size()
#define setbits(x) __builtin_popcountll(x)
#define ff first
#define ss second
#define conts continue
#define ceil2(x,y) ((x+y-1)/(y))
#define all(a) a.begin(), a.end()
#define rall(a) a.rbegin(), a.rend()
#define yes cout << "Yes" << endl
#define no cout << "No" << endl

#define rep(i,n) for(int i = 0; i < n; ++i)
#define rep1(i,n) for(int i = 1; i <= n; ++i)
#define rev(i,s,e) for(int i = s; i >= e; --i)
#define trav(i,a) for(auto &i : a)

template<typename T>
void amin(T &a, T b) {
    a = min(a,b);
}

template<typename T>
void amax(T &a, T b) {
    a = max(a,b);
}

#ifdef LOCAL
#include "debug.h"
#else
#define debug(...) 42
#endif

/*



*/

const int MOD = 1e9 + 7;
const int N = 1e5 + 5;
const int inf1 = int(1e9) + 5;
const ll inf2 = ll(1e18) + 5;

void solve(int test_case){
    ll n,q; cin >> n >> q;
    vector<ll> a(n+5);
    rep1(i,n) cin >> a[i];
    ll k; cin >> k;
    
    vector<ll> dp1(n+5), dp2(n+5);
    deque<pll> dq;

    rev(i,n,1){
        while(!dq.empty() and dq.back().ff > i+k){
            dq.pop_back();
        }

        dp1[i] = dp2[i] = a[i];

        while(!dq.empty()){
            ll l = dq[0].ff;

            if(dq[0].ss > a[i]){
                ll add = ((l-i)*a[i]+k*(a[l]-a[i]))%MOD*(n-l+1)+dp1[l];
                dp1[i] += add;
                dp1[i] %= MOD;
                break;
            }
            else if(sz(dq) == 1){
                ll add = (a[i]+(l-i-1)*a[l])%MOD*(n-l+1)+dp1[l];
                dp1[i] += add, dp2[i] += add;
                dp1[i] %= MOD, dp2[i] %= MOD;
                dq.pop_front();
                break;
            }
            else{
                ll r = dq[1].ff;
                ll add = (a[i]+(l-i-1)*a[l])%MOD*(r-l)+dp2[l];
                dp1[i] += add, dp2[i] += add;
                dp1[i] %= MOD, dp2[i] %= MOD;
                dq.pop_front();
            }
        }

        dq.push_front({i,a[i]});
    }

    rep1(i,n){
        dp1[i] += a[i]*k*(n-i+1);
        dp1[i] %= MOD;
    }

    ll ans = 0;
    rep1(i,n){
        ans += dp1[i];
        ans %= MOD;
    }

    cout << ans << endl;
}

int main()
{
    fastio;

    int t = 1;
    cin >> t;

    rep1(i, t) {
        solve(i);
    }

    return 0;
}