PROBLEM LINK:
Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4
Author: jha_rishi
Tester: nskybytskyi
Editorialist: iceknight1093
DIFFICULTY:
Simple
PREREQUISITES:
Heaps/multisets, binary exponentiation
PROBLEM:
You have an array A.
Exactly K times, you must choose an element A and multiply it by 2.
Find the minimum possible final sum of the array.
EXPLANATION:
A natural first strategy to consider is to repeatedly multiply the smallest element by 2.
This simple greedy strategy is indeed correct.
Proof
First, note that the order of operations doesnât matter much: only how many times we double each element.
Now, suppose we operate on A_i, and let A_j be some other element such that in the end, A_i \gt 2A_j.
We could then instead use one less operation on A_i and one more operation on A_j, which will give us a lower sum because
This tells us that in an optimal solution, if we ever operate on an element A_i, it cannot be more than twice any other element in the end.
This structure is exactly what the greedy algorithm achieves.
However, K can be quite large, so directly implementing this will definitely not be fast enough (let alone the fact that the numbers will quickly get too large to even compare).
Instead, letâs perform a direct simulation for just the first few operations: in particular, perform the operations while K \gt 0 and there exists some A_i \leq 10^9.
Note that in the worst case, we perform N\cdot \log_2(10^9) operations here: each element can be doubled at most 30 times before it exceeds 10^9.
To simulate quickly, we want a data structure that allows us to get the smallest element, delete it, and insert twice of it quickly.
This is easily done by a priority queue or multiset.
Note that this simulation, with a priority queue/multiset, is already enough to solve the first subtask since K \leq N.
Once weâve performed the first few operations as above, letâs now sort the array, so that A_i \leq A_{i+1}.
Every element is \gt 10^9, so we observe that the process will continue as follows:
First double A_1, then A_2, then A_3, and so on till we double A_N.
Then repeat: that is, double A_1, A_2, \ldots, A_N in order again, and repeat.
Why?
Notice that once all the elements are \gt 10^9, we definitely wouldâve used at least one move on each of them.
This means that for any A_i and A_i, A_i \leq 2\cdot A_j will hold, as seen in the initial proof of correctness.
So, at this stage, whenever we double an element itâll become larger than everything else.
That means:
- A_1 gets doubled, and becomes larger than everything else. A_2 is now the smallest.
- A_2 gets doubled, and becomes larger than everything else. A_3 is now the smallest.
\vdots - A_N gets doubled, and becomes larger than everything else.
Since everything was multiplied by 2 exactly once, the process simply repeats.
This predictable pattern allows us to optimize the operations: each index will receive \left\lfloor \frac{K}{N} \right\rfloor of the remaining operations, and then the smallest (K\bmod N) of them will receive one extra operation.
Each operation corresponds to a multiplication by 2, so the final value at index i will be A_i multiplied by some power of 2.
This can be computed quickly using binary exponentiation.
TIME COMPLEXITY:
\mathcal{O}(N\log N\log\max A) per testcase.
CODE:
Editorialist's code (C++)
// #include <bits/allocator.h>
// #pragma GCC optimize("O3,unroll-loops")
// #pragma GCC target("avx2,bmi,bmi2,lzcnt,popcnt")
#include "bits/stdc++.h"
using namespace std;
using ll = long long int;
mt19937_64 RNG(chrono::high_resolution_clock::now().time_since_epoch().count());
int main()
{
ios::sync_with_stdio(false); cin.tie(0);
int t; cin >> t;
while (t--) {
int n, k; cin >> n >> k;
vector<int> a(n);
for (int &x : a) cin >> x;
priority_queue<int, vector<int>, greater<int>> pq;
for (int i = 0; i < n; ++i) pq.push(a[i]);
const int lim = (1 << 30);
while (k > 0) {
if (pq.top() >= lim) break;
int x = pq.top(); pq.pop();
pq.push(2*x);
--k;
}
while (pq.size() > 0) {
a[pq.size()-1] = pq.top();
pq.pop();
}
reverse(begin(a), end(a));
ll ans = 0;
const int mod = 1e9 + 7;
auto pow2m = [&] (int x) {
ll res = 1, a = 2;
while (x) {
if (x & 1) res = (res * a) % mod;
a = (a * a) % mod;
x /= 2;
}
return res;
};
for (int i = 0; i < n; ++i) {
if (i < k%n) ans += a[i] * pow2m(k/n + 1);
else ans += a[i] * pow2m(k/n);
ans %= mod;
}
cout << ans << '\n';
}
}
Tester's code (C++)
#include <bits/stdc++.h>
using namespace std;
// Alternatively, you could use atcoder::modint
constexpr int mod = 1'000'000'007;
int modpow(int64_t base, int exponent) {
int64_t result = 1;
while (exponent) {
if (exponent & 1) {
result *= base;
result %= mod;
}
base *= base;
base %= mod;
exponent >>= 1;
}
return result;
}
int solve(int n, int k, const vector<int>& a) {
const auto mx = *max_element(a.cbegin(), a.cend());
priority_queue<int, vector<int>, greater<>> pq = {a.cbegin(), a.cend()};
// This loop makes O(log(mx)) operations per element
// This is because x * 2^log(mx) = x * mx >= mx as x >= 1
// O(n * log(mx)) iterations take O(n * log(n) * log(mx)) time
while (k && 2 * pq.top() < mx) {
--k;
// Python has heapq.heapreplace
pq.push(2 * pq.top());
pq.pop();
}
int64_t sum = 0;
// No need to recompute 2^(k/n) for every element
const int64_t pow = modpow(2, k / n);
while (!pq.empty()) {
if (k % n) {
--k;
// Beware of overflow
sum += 2 * pow * pq.top();
} else {
// Beware of overflow
sum += pow * pq.top();
}
pq.pop();
sum %= mod;
}
return sum;
}
int main() {
cin.tie(0)->sync_with_stdio(0);
// Separating IO and solution is a good practice
int t; cin >> t; while (t--) {
int n, k;
cin >> n >> k;
vector<int> a(n);
for (auto& ai : a) {
cin >> ai;
}
cout << solve(n, k, a) << '\n';
}
}