SPEEDRUN1 - Editorial

iceknight1093 · July 13, 2025, 5:30pm

PROBLEM LINK:

Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4

Author:
Tester: sushil2006
Editorialist: iceknight1093

DIFFICULTY:

Easy

PREREQUISITES:

Dynamic programming, stacks

PROBLEM:

There’s a videogame with N levels. Level i needs at least A_i armor points to be cleared.
Define f(i, j, K) as follows:

Start at level i with 0 armor points.
You can spend K seconds to increase your armor points by 1, or 0 seconds to decrease it by 1.
You can choose to complete a level once you have its requisite armor points (or more).
Completing a level when you have y armor points requires y seconds.
Completing level i puts you at level i+1, then i+2, and so on till j is completed.
f(i, j, K) is the minimum time required to complete level j, starting at i.

You are given a parameter K.
Compute the value

\sum_{i=1}^N \sum_{j=i}^N f(i, j, K)

N \leq 4000 here.

EXPLANATION:

Let’s forget the sum-over-subarrays part, and just solve for a single array A now, where we start at 1 and want to reach N.

A straightforward initial solution is to use dynamic programming.
Define dp[i][x] to be the minimum cost of completing level i while ending up with x armor points.
Transitions are quite simple: for each value of y, we can move from dp[i-1][y] to dp[i][x] by paying the appropriate cost (which is 0 if y \geq x, and K\cdot (x-y) otherwise), and then we incur a cost of x seconds when completing the i-th level itself, so we have

dp[i][x] = x + \min\left(\min_{y\geq x} \left(dp[i-1][y]\right), \min_{y\lt x} \left(dp[i-1][y] + K\cdot (x-y)\right)\right)

Since A_i \leq N, we only need to work with x \leq N. This gives us \mathcal{O}(N^2) states, with \mathcal{O}(N) transitions from each for \mathcal{O}(N^3) overall.

It’s not too hard to optimize the transitions to constant time: note that the y \geq x part is just a suffix minimum of the dp[i-1] array, while the y \lt x part can be optimized by taking prefix minimums of the (dp[i-1][y] - K\cdot y) values.
This gives us \mathcal{O}(N^2) for a single array.

Quadratic for a single array is obviously way too slow when it needs to be run over all subarrays, so let’s try to improve it further.

The bottleneck here is the number of states: we definitely can’t have \mathcal{O}(N^2) states.

Let’s call level i good, if in an optimal solution we complete level i while having exactly A_i armor points.
Note that levels 1 and N will always be good: there’s no point having more than A_1 armor points for the first level (we can always increase later if needed), and there’s no point having more than A_N armor points for the last level since decreasing is free.

Now, suppose levels i and j are consecutive good levels in an optimal solution (i.e. there are no good levels between i and j).
We can then say the following:

For each index x satisfying i \lt x \lt j, we must have A_x \lt \min(A_i, A_j).
When going through levels i+1, i+2, \ldots, j-1, it’s optimal to always keep our armor points at \min(A_i, A_j).

Proof

First, we show that A_x \lt \min(A_i, A_j) for all i \lt x \lt j.

Consider the leftmost index x with x\gt i and A_x \geq A_i.
As we move from i to x, in all the indices i, i+1, i+2, \ldots, x-1 we’ll definitely never have more than A_i armor points: it makes no sense to increase it yet when all their requirements are \leq A_i anyway.
This means, when reach level x, we’ll have \leq A_x armor points too: so the optimal solution will be to increase armor points to reach A_x itself.
But then this would make x also a good level - which by assumption it is not.
So, we can’t have A_x \geq A_i for any i \lt x \lt j.

Similarly, it can be shown that we can’t have A_x \geq A_j for any x in the range too: instead of the leftmost x look at the rightmost one, the reasoning is otherwise the same.
This shows that A_x \lt \min(A_i, A_j) for all i \lt x \lt j.

Next, we need to show that it’s optimal to keep the armor points at \min(A_i, A_j) while going through the middle.
Suppose it wasn’t, i.e. at some point we reduced our armor points below this threshold.
Then, we’ll definitely need to bring it back up to A_j to cross index j.
So, each reduction below \min(A_i, A_j) comes with a cost of K (for the ensuing increase), and then can save at most j-i-1 in terms of cost (at best we reduce the time for each level in the middle by 1).

If K \geq j-i-1 it’s obviously not optimal to ever do this.
If K \lt j-i-1 it will in fact be optimal to reduce: but in this case, observe that rather than reducing once, we can just keep reducing till we reach another value, since each reduction improves the cost: specifically, till we reach the maximum value in the range [i+1, j-1] (which, recall, is smaller than the endpoints.)
However, this would then result in said maximum value being a good level, contradicting i and j being consecutive good levels in an optimal solution.
So, this case cannot happen in an optimal solution; and so always staying at \min(A_i, A_j) is correct.

The first point is important, because it tells us the following fact: if i and j are consecutive good indices, then either i or j is the nearest greater element for the other index.
More precisely, if we define \text{nxt}[i] to be the smallest index \gt i containing a value \geq A_i, and \text{prv}[i] to be the largest index \lt i containing a value \geq A_i, then for consecutive good indices i and j, we must have either i = \text{prv}[j] or \text{nxt}[i] = j.

The \text{nxt} and \text{prv} arrays can be computed in linear time using a stack.

We can now use these ideas to optimize the DP further.

Define dp_i to be the minimum cost of completing level i, while ensuring that level i is good.
Then, dp_i only needs to be updated by considering all those j \lt i for which either j = \text{prv}[i], or \text{nxt}[j] = i.
For each such j:

The cost of reaching j is dp_j.
Everything between i and j will be completed at \min(A_i, A_j) armor points, as noted earlier.
This adds \min(A_i, A_j) \cdot (i-j-1) to the cost.
If A_j \lt A_i, we’ll also require an additional K\cdot (A_i - A_j) seconds to bring the armor points up after reaching i.
Of course, there’s also a cost of A_i to complete level i itself.

So, each j can be processed in constant time.
Since the transitions only encompass pairs (i, \text{nxt}[i]) or (i, \text{prv}[i]), there are at most 2N transitions in total, meaning we’ll have the value of dp_N in linear time.

dp_N is exactly the value we want, because as noted earlier it’s always optimal to end the last level on A_N armor points, i.e. index N is always good.

Let’s now move to solving for all subarrays.

There are \mathcal{O}(N^2) subarrays, and so directly solving for each one will result in a complexity of \mathcal{O}(N^3).
To optimize this, note that if the left endpoint of the subarray is fixed, running the DP till index N will implicitly compute the answers for all right endpoints anyway.
So, we can just run the DP on each suffix and sum up the entire DP array rather than take only the last value.
This makes the complexity \mathcal{O}(N^2), which is fast enough for this version of the task.

You might have noticed that the answer must be printed modulo 10^9 + 7.
For this version of the problem though, that doesn’t pose much of an issue.
Note that any f(i, j, K) value is trivially bounded above by KN + N^2 \leq 2N^2, because we can always perform N increases first and then just clear any level.
This means the sum of answers across all subarrays is bounded by 2N^4, which for the constraints here don’t exceed the limits of a 64-bit integer.
So, you can just compute the answer normally, and take modulo at the end.

TIME COMPLEXITY:

\mathcal{O}(N^2) per testcase.

CODE:

Editorialist's code (C++)

// #include <bits/allocator.h>
// #pragma GCC optimize("O3,unroll-loops")
// #pragma GCC target("avx2,bmi,bmi2,lzcnt,popcnt")
#include "bits/stdc++.h"
using namespace std;
using ll = long long int;
mt19937_64 RNG(chrono::high_resolution_clock::now().time_since_epoch().count());

int main()
{
    ios::sync_with_stdio(false); cin.tie(0);

    int t; cin >> t;
    while (t--) {
        int n, q; cin >> n >> q;
        vector a(n, 0);
        for (int &x : a) cin >> x;
        int k; cin >> k;

        const int mod = 1e9 + 7;
        auto solve = [&] (auto v) {
            int m = size(v);
            vector nxt(m, m), prv(m, -1);
            vector transitions(m, vector<int>());
    
            {
                stack<int> st;
                for (int i = 0; i < m; ++i) {
                    while (!st.empty()) {
                        if (v[i] > v[st.top()]) st.pop();
                        else break;
                    }
                    if (!st.empty()) prv[i] = st.top();
                    st.push(i);
                }
            }
    
            {
                stack<int> st;
                for (int i = m-1; i >= 0; --i) {
                    while (!st.empty()) {
                        if (v[i] > v[st.top()]) st.pop();
                        else break;
                    }
                    if (!st.empty()) nxt[i] = st.top();
                    st.push(i);
                }
            }
    
            for (int i = 0; i < m; ++i) {
                if (nxt[i] < m) transitions[nxt[i]].push_back(i);
                if (prv[i] >= 0) transitions[i].push_back(prv[i]);
            }
    
            vector dp(m, LLONG_MAX);
            dp[0] = 1ll*k*v[0] + v[0];
            for (int i = 1; i < m; ++i) {
                for (int j : transitions[i]) {
                    ll cost = 1ll*(i-j-1)*min(a[i], a[j]) + a[i];
                    if (a[j] < a[i]) cost += k*(a[i] - a[j]);
                    dp[i] = min(dp[i], dp[j] + cost);
                }
            }

            ll res = 0;
            for (auto x : dp) {
                res += x;
                res %= mod;
            }
            return res;
        };

        ll ans = 0;
        for (int i = 0; i < n; ++i) {
            ans += solve(a);
            a.erase(a.begin());
        }
        cout << ans % mod << '\n';
    }
}