JUSTTWO - Editorial

iceknight1093 · August 21, 2024, 4:30pm

PROBLEM LINK:

Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4

Author: jha_rishi
Tester: nskybytskyi
Editorialist: iceknight1093

DIFFICULTY:

Simple

PREREQUISITES:

Heaps/multisets, binary exponentiation

PROBLEM:

You have an array A.
Exactly K times, you must choose an element A and multiply it by 2.
Find the minimum possible final sum of the array.

EXPLANATION:

A natural first strategy to consider is to repeatedly multiply the smallest element by 2.
This simple greedy strategy is indeed correct.

Proof

First, note that the order of operations doesn’t matter much: only how many times we double each element.

Now, suppose we operate on A_i, and let A_j be some other element such that in the end, A_i \gt 2A_j.
We could then instead use one less operation on A_i and one more operation on A_j, which will give us a lower sum because

A_i + A_j \gt \frac{A_i}{2} + 2A_j \iff \frac{A_i}{2} \gt A_j \iff A_i \gt 2A_j

This tells us that in an optimal solution, if we ever operate on an element A_i, it cannot be more than twice any other element in the end.

This structure is exactly what the greedy algorithm achieves.

However, K can be quite large, so directly implementing this will definitely not be fast enough (let alone the fact that the numbers will quickly get too large to even compare).

Instead, let’s perform a direct simulation for just the first few operations: in particular, perform the operations while K \gt 0 and there exists some A_i \leq 10^9.
Note that in the worst case, we perform N\cdot \log_2(10^9) operations here: each element can be doubled at most 30 times before it exceeds 10^9.
To simulate quickly, we want a data structure that allows us to get the smallest element, delete it, and insert twice of it quickly.
This is easily done by a priority queue or multiset.
Note that this simulation, with a priority queue/multiset, is already enough to solve the first subtask since K \leq N.

Once we’ve performed the first few operations as above, let’s now sort the array, so that A_i \leq A_{i+1}.
Every element is \gt 10^9, so we observe that the process will continue as follows:
First double A_1, then A_2, then A_3, and so on till we double A_N.
Then repeat: that is, double A_1, A_2, \ldots, A_N in order again, and repeat.

Why?

Notice that once all the elements are \gt 10^9, we definitely would’ve used at least one move on each of them.
This means that for any A_i and A_i, A_i \leq 2\cdot A_j will hold, as seen in the initial proof of correctness.

So, at this stage, whenever we double an element it’ll become larger than everything else.
That means:

A_1 gets doubled, and becomes larger than everything else. A_2 is now the smallest.
A_2 gets doubled, and becomes larger than everything else. A_3 is now the smallest.
\vdots
A_N gets doubled, and becomes larger than everything else.
Since everything was multiplied by 2 exactly once, the process simply repeats.

This predictable pattern allows us to optimize the operations: each index will receive \left\lfloor \frac{K}{N} \right\rfloor of the remaining operations, and then the smallest (K\bmod N) of them will receive one extra operation.

Each operation corresponds to a multiplication by 2, so the final value at index i will be A_i multiplied by some power of 2.
This can be computed quickly using binary exponentiation.

TIME COMPLEXITY:

\mathcal{O}(N\log N\log\max A) per testcase.

CODE:

Editorialist's code (C++)

// #include <bits/allocator.h>
// #pragma GCC optimize("O3,unroll-loops")
// #pragma GCC target("avx2,bmi,bmi2,lzcnt,popcnt")
#include "bits/stdc++.h"
using namespace std;
using ll = long long int;
mt19937_64 RNG(chrono::high_resolution_clock::now().time_since_epoch().count());

int main()
{
    ios::sync_with_stdio(false); cin.tie(0);

    int t; cin >> t;
    while (t--) {
        int n, k; cin >> n >> k;
        vector<int> a(n);
        for (int &x : a) cin >> x;

        priority_queue<int, vector<int>, greater<int>> pq;
        for (int i = 0; i < n; ++i) pq.push(a[i]);

        const int lim = (1 << 30);
        while (k > 0) {
            if (pq.top() >= lim) break;
            int x = pq.top(); pq.pop();
            pq.push(2*x);
            --k;
        }

        while (pq.size() > 0) {
            a[pq.size()-1] = pq.top();
            pq.pop();
        }
        reverse(begin(a), end(a));

        ll ans = 0;
        const int mod = 1e9 + 7;
        auto pow2m = [&] (int x) {
            ll res = 1, a = 2;
            while (x) {
                if (x & 1) res = (res * a) % mod;
                a = (a * a) % mod;
                x /= 2;
            }
            return res;
        };
        for (int i = 0; i < n; ++i) {
            if (i < k%n) ans += a[i] * pow2m(k/n + 1);
            else ans += a[i] * pow2m(k/n);
            ans %= mod;
        }
        cout << ans << '\n';
    }
}

Tester's code (C++)

#include <bits/stdc++.h>
using namespace std;

// Alternatively, you could use atcoder::modint
constexpr int mod = 1'000'000'007;
int modpow(int64_t base, int exponent) {
    int64_t result = 1;
    while (exponent) {
        if (exponent & 1) {
            result *= base;
            result %= mod;
        }
        base *= base;
        base %= mod;
        exponent >>= 1;
    }
    return result;
}

int solve(int n, int k, const vector<int>& a) {
    const auto mx = *max_element(a.cbegin(), a.cend());
    priority_queue<int, vector<int>, greater<>> pq = {a.cbegin(), a.cend()};
    // This loop makes O(log(mx)) operations per element
    // This is because x * 2^log(mx) = x * mx >= mx as x >= 1
    // O(n * log(mx)) iterations take O(n * log(n) * log(mx)) time
    while (k && 2 * pq.top() < mx) {
        --k;
        // Python has heapq.heapreplace
        pq.push(2 * pq.top());
        pq.pop();
    }

    int64_t sum = 0;
    // No need to recompute 2^(k/n) for every element
    const int64_t pow = modpow(2, k / n);
    while (!pq.empty()) {
        if (k % n) {
            --k;
            // Beware of overflow
            sum += 2 * pow * pq.top(); 
        } else {
            // Beware of overflow
            sum += pow * pq.top();
        }
        pq.pop();
        sum %= mod;
    }
    return sum;
}

int main() {
    cin.tie(0)->sync_with_stdio(0);
    // Separating IO and solution is a good practice
    int t; cin >> t; while (t--) {
        int n, k;
        cin >> n >> k;
        vector<int> a(n);
        for (auto& ai : a) {
            cin >> ai;
        }
        cout << solve(n, k, a) << '\n';
    }
}

saiumesh23 · August 21, 2024, 6:24pm

can you discuss time complexity of this q

saiumesh23 · August 21, 2024, 6:27pm

got it thank you

pkkrpacific · August 21, 2024, 10:49pm

Why the code is not giving tle on test
1
200000 1000000000
1,…n-1times, 1000000000
This test case is taking 2.8 sec