BINSTRAGAIN - Editorial

iceknight1093 · August 7, 2024, 5:16pm

PROBLEM LINK:

Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4

Author: prasant21
Preparation: raysh07
Tester: mexomerf
Editorialist: iceknight1093

DIFFICULTY:

TBD

PREREQUISITES:

Prefix sums

PROBLEM:

You’re given a (large) binary string S of length N.
You can do the following at most once.

Choose an integer K \gt 1 that divides N, and break S into substrings of length K.
Rearrange these substrings as you like, then concatenate them into a single string of length N again.

Find the maximum possible number of times 01 can appear as a subsequence.

EXPLANATION:

Let’s fix the length K of the substrings S gets broken into, and try to find which rearrangement maximizes the answer.
Let p = \frac{N}{K} be the number of substrings we have.
For the i-th of these substrings, we denote its number of ones with A_i, and the number of 01 subsequences contained within it by B_i.

Now, if we fix an order of these substrings, say i_1, i_2, \ldots, i_p, it’s not hard to see that the number of 01-subsequences we obtain is exactly

\sum_{x=1}^p \left(B_{i_x} + A_{i_x}\cdot\sum_{y=1}^{i-1} \left(K-A_{i_y}\right)\right)

This follows from the observation that for the x-th block, any 01-subsequence ending at a 1 in it either already exists within it (the B_{i_x} term), or is formed by taking a 1 in this block and a 0 from an earlier block.

The above expression is clearly maximized when the A_i are taken in ascending order.

So, all we need to do is to be able to compute the values of A_i and B_i quickly enough.
This can be done with the help of prefix sums.

How?

A_i denotes the number of ones within some range of the array, which is straightforward with prefix sums.

As for the B_i values: note that we don’t really care much about individual B_i values, we only really want all their sums.
To that end, let’s define an auxiliary array L, where L_i denotes the number of zeros to the left of index i.
Then, it can be seen that the number of 01 subsequences within the block [i, i+K) is simply the sum of (L_j - L_i) across all indices j (i \leq j \lt i+K) such that S_j = 1.

Expanding the summation, this is the sum of all L_j of ones within the block, minus L_i multiplied by the number of ones within the block.
The latter term is easy to compute (we already have prefix sums to compute the number of ones within a block), while the former, summed up across all blocks, simply equals the sum of all L_j over all indices j with S_j = 1 (this is a constant, and so can be precomputed).

This way, we have all the A_i values, and the sum of all B_i values, which is what we wanted.

Now that we know how to compute the answer for a fixed K, we need to decide which K to choose.
K must be a divisor of N - and for the first subtask, simply trying every divisor (and applying the above solution to each one) will get you AC.

We can, however, go a bit further.
Notice that it’s enough to consider only prime divisors of N as candidates for K.
This is because, if p divides K (and p \lt K), breaking into blocks of size p gives us strictly more freedom than breaking into blocks of length K — after all, one block of size K is formed from \frac{K}{p} blocks of size p, and rearrangement is free.
So, we only need to consider divisors of N that have no smaller factors (1 aside), which is exactly the prime divisors.

As it turns out, doing this improves the complexity a fair bit!

Complexity analysis

Let’s see what happens for a fixed prime p.

First, we compute the A_i and B_i values.
These can be done using prefix sums, so the complexity is \mathcal{O}(\frac{N}{p}).
Then, we need to sort the A_i values.
This can be done in \mathcal{O}(p + \frac{N}{p}) time using counting sort.
Finally, we use the sorted values to compute the answer.
This is again \mathcal{O}(\frac{N}{p}).

The overall complexity is hence \mathcal{O}(p + \frac{N}{p}).

Looking at each of its terms separately:

p here refers to a prime factor of N.
The sum of all distinct prime factors of N cannot exceed N, so this part summed up across all p is \mathcal{O}(N).
This leaves us to estimate the sum of \frac{N}{p} across all p.
Let’s take the factor of N out, and estimate the sum of \frac{1}{p} instead.

We’re looking at the distinct prime factors of N, and there can be at most \log N of them.
The sum of reciprocals of primes is clearly maximized when the primes are as small as possible - that is, the smallest \log N primes are chosen.

By the prime number theorem, the \log N-th prime number is approximately \log N \log\log N.
It is also known that the sum of the reciprocals of the primes exhibits log-log growth, so with our bound on the largest prime that needs to be considered, we obtain a bound of

\mathcal{O}(\log(\log \left(\log (N) \cdot \log(\log N))) \right)

on the sum of the reciprocals of primes we care about.

In practice, you can verify that we only need to consider primes \leq 19, and the sum of the reciprocals of the primes till there is quite small - it’s even less than 1.5.

Note that the above complexity analysis relies on the fact that we sort the blocks of length p in \mathcal{O}(p + \frac{N}{p}) time using counting sort - after all, we’re really sorting the number of ones within each block, and each of those values cannot exceed p.
If you use a slower sorting algorithm, even if it’s \mathcal{O}(\frac{N}{p}\log \frac{N}{p}), there’s a decent chance you will receive TLE.

An algorithm that runs in \mathcal{O}(N\cdot \omega(N)) might also pass if implemented well.

TIME COMPLEXITY:

\mathcal{O}(N\log(\log \left(\log (N) \cdot \log(\log N))) \right) per testcase.

CODE:

Editorialist's code (C++)

#pragma GCC optimize("O3,unroll-loops")
#pragma GCC target("avx2,bmi,bmi2,lzcnt,popcnt")
#include "bits/stdc++.h"
using namespace std;
using ll = long long int;
mt19937_64 RNG(chrono::high_resolution_clock::now().time_since_epoch().count());

int main()
{
    ios::sync_with_stdio(false); cin.tie(0);

    int t; cin >> t;
    while (t--) {
        int n, m; cin >> n >> m;
        basic_string<int> pref(30*m+1, 0), zeros(30*m + 1, 0);
        int pos = 1;
        for (int i = 0; i < m; ++i) {
            int x; cin >> x;
            for (int j = 0; j < 30; ++j) {
                pref[pos] = (x >> j) & 1;
                zeros[pos] = 1 - pref[pos];

                pref[pos] += pref[pos-1];
                zeros[pos] += zeros[pos-1];
                ++pos;
            }
        }

        ll ans = 0, allsm = 0;
        for (int i = 1; i <= n; ++i) {
            if (pref[i] != pref[i-1]) allsm += zeros[i];
        }

        auto process = [&] (int p) {
            ll sub = 0;
            basic_string<int> freq(p+1, 0);
            for (int i = 1; i <= n; i += p) {
                int ones = pref[i+p-1] - pref[i-1];
                sub += 1ll*ones*zeros[i-1];
                ++freq[ones];
            }
            ll curans = allsm - sub;
            int before = p*freq[0];

            for (int i = 1; i <= p; ++i) {
                curans += 1ll*i*(freq[i]*(2*before + 1ll*(freq[i] - 1)*(p-i)))/2;
                before += 1ll*freq[i]*(p-i);
            }
            ans = max(ans, curans);
        };
        int x = n;
        for (int p = 2; p*p <= x; ++p) {
            if (x%p) continue;
            process(p);
            while (x%p == 0) x /= p;
        }
        if (x > 1) process(x);
        cout << ans << '\n';
    }
}

smilodon · August 7, 2024, 7:37pm

what is meant by ω(N) ?

nskybytskyi · August 8, 2024, 7:39am

Prime omega function.