CYCADEQ - Editorial

PROBLEM LINK:

Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4

Author: raysh07
Tester: raysh07
Editorialist: iceknight1093

DIFFICULTY:

Simple

PREREQUISITES:

None

PROBLEM:

You have an array A of length N, filled with zeros.
You can do the following:

  1. Cyclically right-shift A.
    This has a cost of C.
  2. Choose an index i and add 1 to A_i.
    This has a cost of B_i.

Find the minimum cost of making the array A equal to the array D.

EXPLANATION:

For ease of notation, we assume the arrays are 0-indexed.

Let T denote the total number of times we perform the cyclic shift operation.
The total cost of this is C\times T, since each shift costs C.

Observe that the element initially at index i of A will end up at index (i+T)\bmod N in the end.
In the process, it will pass through all the indices i, i+1, i+2, \ldots till it reaches (i+T)\bmod N.

Now, for A to equal D, the final value of this element must be D_{(i+T)\bmod N}.
Since it starts out equal to 0, this means it increases exactly D_{(i+T)\bmod N} times.

The question now is: what’s the minimum cost of performing these increases?
To answer that, let’s look at the costs available to us at all.
The element starts out at index i, so we can increase it a few times with a cost of B_i before the first cyclic shift.
After the first cyclic shift, it will be at position i+1. This means we can increase it a few times with a cost of B_{i+1} as well.
Then another cyclic shift happens, and we have the cost B_{i+2} available to us, and so on.

In general, note that we’ll have all the costs B_i, B_{i+1}, \ldots, B_{(i+T)\bmod N} available to us, and each can be used however many times we like.
Our goal is to minimize the cost of D_{(i+T)\bmod N} additions.
Clearly, it’s optimal to just use the cost of \min(B_i, B_{i+1}, \ldots, B_{(i+T)\bmod N}) for every addition to this element.

That is, the overall cost incurred by element i is exactly D_{(i+T)\bmod N} \cdot \min(B_i, B_{i+1}, \ldots, B_{(i+T)\bmod N}).

Now, it’s easy to see that this cost can be computed independently for each element i and added up, since they don’t really interfere with each other (given that the total number of cyclic shifts is fixed).

So, for a fixed value of T, the minimum cost can be computed in \mathcal{O}(N^2) time easily: for each i, find the minimum element of the corresponding range of B in linear time, and then add the appropriate term to the answer.


Now, observe that we don’t actually have many choices for the value of T, i.e. the number of cyclic shifts.
In particular, if T \ge N, we could just perform (N-1) cyclic shifts instead: the cost of cyclic shifts is lower, and the set of addition costs available to each element is the same either way, so we’ll have lower overall cost just performing (N-1) cyclic shifts.

Thus, we only need to check for T = 0, 1, \ldots, N-1, meaning N choices in total.

Since each T was being checked for in quadratic time, the algorithm is \mathcal{O}(N^3) overall - which is too slow for the constraints.

To optimize this, observe that one slow part is the computation of \min(B_i, B_{i+1}, \ldots, B_{(i+T)\bmod N}), i.e. the minimum of some range of B (or, if the range wraps around, the minimum of some prefix and some suffix combined).
Rather than compute this in linear time, we can simply store the minimums of all subarrays of B in quadratic time, and then just look up the one we need in constant time.

There are alternate solutions too: for instance:

  1. Use a range minimum data structure such as a sparse table or segment tree.
  2. Use a sliding window minimum algorithm to compute the answer for all windows of fixed length in linear or \mathcal{O}(N\log N) time.
  3. Use dynamic programming, as follows:
    Note that the “windows” for an element i with respect to T and T+1 differ by only a single element - and that element is an extra one added to the end of the window.
    So, we can maintain dp_i to be the minimum of the window starting at i; and when moving from T to T+1, simply update all the values of dp_i with the new element which takes linear time.
    You might notice that this is fundamentally equivalent to just computing the minimum of all (cyclic) subarrays on-the-fly, rather than precomputing them as mentioned earlier.

TIME COMPLEXITY:

\mathcal{O}(N^2) or \mathcal{O}(N^2 \log N) per testcase.

CODE:

Editorialist's code (C++)
#include <bits/allocator.h>
#pragma GCC optimize("O3,unroll-loops")
#pragma GCC target("avx2,bmi,bmi2,lzcnt,popcnt")
#include "bits/stdc++.h"
using namespace std;
using ll = long long int;
mt19937_64 RNG(chrono::high_resolution_clock::now().time_since_epoch().count());

int main()
{
    ios::sync_with_stdio(false); cin.tie(0);

    int t; cin >> t;
    while (t--) {
        int n, c; cin >> n >> c;
        vector a(n, 0), b(n, 0), d(n, 0);
        for (int &x : b) cin >> x;
        for (int &x : d) cin >> x;

        vector mn(n, vector(n, INT_MAX));
        for (int i = 0; i < n; ++i) {
            mn[i][i] = b[i];
            for (int j = i+1; j < n; ++j) {
                mn[i][j] = min(b[j], mn[i][j-1]);
            }
        }
        auto getmin = [&] (int L, int R) {
            if (L <= R) return mn[L][R];
            else return min(mn[0][R], mn[L][n-1]);
        };

        ll ans = 1e18;
        for (int shift = 0; shift < n; ++shift) {
            ll cur = 1ll*shift*c;
            for (int i = 0; i < n; ++i) {
                int reach = (i + shift) % n;
                
                int best = getmin(i, reach);
                cur += 1ll*best*(d[reach] - a[i]);
            }
            ans = min(ans, cur);
        }
        cout << ans << '\n';
    }
}
Tester's code (C++)
#include <bits/stdc++.h>
using namespace std;
#define int long long
#define INF (int)1e18

mt19937_64 RNG(chrono::steady_clock::now().time_since_epoch().count());

void Solve() 
{
    int n, c; cin >> n >> c;
    
    vector <int> b(n + 1), d(n + 1);
    for (int i = 1; i <= n; i++){
        cin >> b[i];
    }
    for (int i = 1; i <= n; i++){
        cin >> d[i];
    }
    
    vector <int> dp(n + 1, INF);
    int ans = INF;
    for (int s = 0; s < n; s++){
        int curr = s * c;
        for (int i = 1; i <= n; i++){
            int r = i - s;
            if (r < 1) r += n;
            dp[i] = min(dp[i], b[r]);
            
            curr += dp[i] * d[i];
        }
        ans = min(ans, curr);
    }
    
    cout << ans << "\n";
}

int32_t main() 
{
    auto begin = std::chrono::high_resolution_clock::now();
    ios_base::sync_with_stdio(0);
    cin.tie(0);
    int t = 1;
    // freopen("in",  "r", stdin);
    // freopen("out", "w", stdout);
    
    cin >> t;
    for(int i = 1; i <= t; i++) 
    {
        //cout << "Case #" << i << ": ";
        Solve();
    }
    auto end = std::chrono::high_resolution_clock::now();
    auto elapsed = std::chrono::duration_cast<std::chrono::nanoseconds>(end - begin);
    cerr << "Time measured: " << elapsed.count() * 1e-9 << " seconds.\n"; 
    return 0;
}