MILKYDARK - Editorial


Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4

Author: frtransform
Tester: sky_nik
Editorialist: iceknight1093




Dynamic programming


There N chocolates. The i-th has sweetness A_i and bitterness B_i.
The value of a continuous sequence of chocolates equals the minimum of its total sweetness and total bitterness.

Find the minimum possible overall value if all the chocolates have to be eaten in exactly K days, with at least one chocolate being eaten everyday.


Notice that the product N\cdot K isn’t too large.

This gives rise to a natural dynamic programming solution.
Let dp(i, x) denote the minimum overall value of dividing the first i chocolates into exactly x segments.

Then, we have

dp(i, x) = \min_{j \lt i} \left( dp(j, x-1) + \min(A_{j+1} + \ldots + A_i, B_{j+1} + \ldots + B_i) \right)

Let pA denote the prefix sum array of A, and pB denote that of B.
This allows us to write the expression more succinctly as

dp(i, x) = \min_{j \lt i} \left( dp(j, x-1) + \min(pA_i - pA_j, pB_i - pB_j) \right)

If implemented directly, this gives us a solution in \mathcal{O}(N^2K), still too slow.

There are now a couple of ways to optimize this.

Subtask 1

Note that pA_i - pA_j \leq pB_i - pB_j \iff pA_i - pB_i \leq pA_j - pB_j

Let’s define d_i := pA_i - pB_i.

  • For all j \lt i such that d_i \leq d_j, we want to take the value dp(j, x-1) + pA_i - pA_j.
    So, we just want the minimum of (dp(j, x-1) - pA_j) across all such indices, since pA_i is a constant.
  • For all j \lt i such that d_i \gt d_j, we want to take dp(j, x-1) + pB_i - pB_j instead.
    Again, we just need the minimum of dp(j, x-1) - pB_j across all such j, since pB_i is constant.

One way to find this quickly is to use a segment tree.
Sort the indices by their d_i values.
Then, maintain two segment trees: one holding the dp(j, x-1) - pA_j values, and one holding the dp(j, x-1) - pB_j values seen so far (indices not seen yet can be assumed to hold a value of \infty).

When processing an index i, query for the minimum in the appropriate range of both segment trees (one will be a suffix, one a prefix), and set dp(i, x) appropriately.
Then, insert dp(i, x-1) - pA_i and dp(i, x-1) - pB_i at their appropriate positions of the tree.

This allows for a solution in \mathcal{O}(NK\log N), which is fast enough to pass subtask 1.

Subtask 2

Rewriting the expression we have a bit,

dp(i, x) = \min_{j \lt i} \left( dp(j, x-1) + \min(pA_i - pA_j, pB_i - pB_j) \right) \\ = \min_{j \lt i} \left( \min(dp(j, x-1) + pA_i - pA_j, dp(j, x-1) + pB_i - pB_j) \right)

Note that we’re minimizing a minimum of two values across all j.
So, equivalently, we can minimize each of those values across all j, and then take the minimum of them both!
That is, we have

dp(i, x) = \min(\min_{j\lt i}(dp(j, x-1) + pA_i - pA_j), \min_{j\lt i}(dp(j, x-1) + pB_i - pB_j))

Computing \min_{j\lt i}(dp(j, x-1) + pA_i - pA_j) is pretty simple: all we need to know is the minimum value of dp(j, x-1) - pA_j across all j \lt i, which can simply be stored in a variable as we iterate.
Similarly, \min_{j\lt i}(dp(j, x-1) + pB_i - pB_j) is easily computed in constant time, at which point we can take the minimum of both.
The overall complexity is \mathcal{O}(NK) now.

Note that this can be easily implemented to use only \mathcal{O}(N) memory, since we only need to remember dp(\cdot, x-1) when computing the values of dp(\cdot, x).
Bringing the memory usage down drastically like this often speeds up your code too (though hopefully shouldn’t be needed for AC here).


\mathcal{O}(N\cdot K) per testcase.


Author's code (C++)
#include <bits/stdc++.h>

using namespace std;

const long long INF = 1LL << 60;
const int MaxN = 100005;

int A[MaxN], B[MaxN];
long long pA[MaxN], pB[MaxN];

void solve() {
    int N, K;
    cin >> N >> K;

    for (int i = 1; i <= N; i++) {
        cin >> A[i];
        pA[i] = pA[i - 1] + A[i];

    for (int i = 1; i <= N; i++) {
        cin >> B[i];
        pB[i] = pB[i - 1] + B[i];

    vector<long long> dp(N + 1, INF);
    dp[0] = 0;
    for (int k = 1; k <= K; k++) {
        vector<long long> ndp(N + 1, INF);
        long long mnA = dp[k - 1] - pA[k - 1];
        long long mnB = dp[k - 1] - pB[k - 1];
        for (int i = k; i <= N; i++) {
            ndp[i] = min(ndp[i], pA[i] + mnA);
            ndp[i] = min(ndp[i], pB[i] + mnB);

            mnA = min(mnA, dp[i] - pA[i]);
            mnB = min(mnB, dp[i] - pB[i]);

    cout << dp[N] << endl;


int main(){

    int tests;
    cin >> tests;
    while (tests--) {

    return 0;
Tester's code (C++)
#include <bits/stdc++.h>
using namespace std;

constexpr auto inf = numeric_limits<int64_t>::max() / 3;

int64_t solve(int n, int k, const vector<vector<int>>& a) {
  vector dp(k + 1, vector<int64_t>(n + 1, inf));
  dp[0][0] = 0;
  for (int i = 0; i < n; ++i) {
    for (int j = 1; j <= k; ++j) {
      dp[j][i + 1] = min(dp[j][i + 1], dp[j - 1][i] + a[j & 1][i]);
      dp[j][i + 1] = min(dp[j][i + 1], dp[j][i] + a[j & 1][i]);

  int64_t mn = inf;
  for (int j = 0; j <= k; ++j) {
    mn = min(mn, dp[j][n]);
  return mn;

int main() {
  int t; cin >> t; while (t--) {
    int n, k;
    cin >> n >> k;
    vector a(2, vector<int>(n));
    for (auto& ai : a) { 
      for (auto& aij : ai) {
        cin >> aij;

    const auto bitter = solve(n, k, a);
    swap(a[0], a[1]);
    const auto sweet = solve(n, k, a);
    cout << min(bitter, sweet) << '\n';
Editorialist's code (Python)
for _ in range(int(input())):
    n, k = map(int, input().split())
    a = [0] + list(map(int, input().split()))
    b = [0] + list(map(int, input().split()))
    prefa, prefb = a[:], b[:]
    for i in range(n):
        prefa[i+1] += prefa[i]
        prefb[i+1] += prefb[i]
    inf = 10**18
    dp = [inf]*(n+1)
    dp[0] = 0
    for x in range(k):
        ndp = [inf]*(n+1)
        mn1, mn2 = dp[0], dp[0]
        for i in range(1, n+1):
            ndp[i] = min(prefa[i] + mn1, prefb[i] + mn2)
            mn1 = min(mn1, dp[i] - prefa[i])
            mn2 = min(mn2, dp[i] - prefb[i])
        dp = ndp[:]

two questions :

Note that this can be easily implemented to use only O(N)\mathcal{O}(N)O(N) memory,

should’nt this be O(k) ??
also how does it speed up the code ?

No, the described solution uses \mathcal{O}(N) memory - dp(\cdot, x) refers to an array of length N indexed by the first element, keeping x constant.

As for speed: I’m not an expert on the topic, but generally speaking I believe allocating large amounts of memory (and specifically, accessing lots of random locations within said large chunk of memory) can cause major slowdowns in code due to cache misses and stuff.
At least, empirically I can say that I’ve definitely seen cases where reducing memory usage by a significant amount also helped improve running time.

1 Like

Ok thank you for clarifications.
But for the first part I think we can reduce it to O(k) memory by only remembering last row.

Maybe it’s possible, but also simply reading and storing the input results in \mathcal{O}(N) memory :slight_smile: