BINSUBGR - Editorial

PROBLEM LINK:

Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4

Author: iceknight1093
Tester: sushil2006
Editorialist: iceknight1093

DIFFICULTY:

Simple

PREREQUISITES:

None

PROBLEM:

A binary string A of length N is called good if there exists an integer K such that:

  • 1 \lt K \lt N and K divides N.
  • Writing A out into a grid of dimensions K\times \frac{N}{K} in row-major order, each column of the grid contains at least one 1 and at least one 0.

A binary string is beautiful if it contains at least one good subsequence.

Given a binary string, compute the length of its longest non-beautiful substring.

EXPLANATION:

First, we need to understand which binary strings are good/beautiful.

It seems quite hard to even check for a binary string even being good, especially if it’s long: we seemingly have no choice but to iterate through all its divisors and generate the appropriate matrix.
This suggests that there must be a simpler condition to check for a binary string being beautiful.

First, we make a simple observation: if a binary string is good, it must have at least two zeros and at least two ones.
This is because 1 \lt K \lt N forces the resulting grid to have at least two columns; and each column must contain both a 0 and a 1.

So, if a string contains \leq 1 occurrences of either 0 or 1, it cannot be good for sure.
Note that this means such a string also cannot be beautiful: after all, if it has \leq 1 occurrences of 0, any of its subsequences will satisfy that condition too.
Note that this immediately tells us that strings of length \leq 3 cannot be beautiful.

This leaves strings that have \geq 2 occurrences of both 0 and 1.
It turns out that such strings are (almost) always beautiful!
Specifically:

  • Any binary string of length \geq 5 that contains at least two occurrences of both 0 and 1 is beautiful.
  • For length 4, the strings 0011, 1100, 1001, and 0110 are beautiful, while the rest are not.
Proof

For length 4, the strings 0011, 1100, 1001, 0110 are beautiful because the correspond to the following 2\times 2 grids:

\begin{bmatrix} > 0 & 0 \\ > 1 & 1 > \end{bmatrix} > \hspace{25pt} > \begin{bmatrix} > 1 & 1 \\ > 0 & 0 > \end{bmatrix} > \hspace{25pt} > \begin{bmatrix} > 1 & 0 \\ > 0 & 1 > \end{bmatrix} > \hspace{25pt} > \begin{bmatrix} > 0 & 1 \\ > 1 & 0 > \end{bmatrix}

For lengths \geq 5, we’ll show that any string with at least two zeros must contain one of these four as a subsequence.

Without loss of generality, let S_1 = 0.
If S_2 = 0 as well, then because S contains at least two 1's, it’ll have 0011 as a subsequence and we’re done.

This leaves us with S_1 = 0 and S_2 = 1.
S contains at least one more 0 and at least one more 1 among its remaining N-2 elements.
Further, since N \geq 5, at least one character has \geq 2 occurrences remaining.
We now have two cases:

  1. The remaining N-2 elements are sorted, i.e. look like 00\ldots 0011\ldots 11.
    This means the string looks like 0100\ldots 0011\ldots 11.
    Here,
  • If there are \geq 2 occurrences of 0 in the sorted part, we have 1001 as a subsequence.
  • Otherwise, there are \geq 2 occurrences of 1, and we have 0011 as a subsequence.
  1. The remaining N-2 elements are not sorted.
    This means they contain 10 as a subsequence, which forms 0110 along with the first two characters.

So, any string of length \geq 5 containing at least two zeros and at least two ones has a good subsequence of length 4, and hence it beautiful.


We can now use this characterization to compute the answer.

If N \leq 3, the answer is just N, so we work with N \geq 4.

First, any substring of length 3 is not beautiful, so the answer is at least 3.
Next, let’s look at substrings of length 4 - there are N-3 of them.
If any of these substrings equal anything other than \{0011, 1100, 1001, 0110\}, we have a length 4 substring that’s not beautiful.

Finally we need to check lengths that are \geq 5.
From above, we know that such substrings are not beautiful if and only if they contain at most one occurrence of either 0 or 1.

This allows us the following check:

  • Fix the left endpoint L.
  • Let R_1 \geq L be the index of the second occurrence of 0 at or after L.
    (If there aren’t two 0's after L, take R_1 = N+1.)
  • Similarly, let R_2 \geq L be the index of the second occurrence of 0 at or after L.
  • Then, note that any subarray of length \geq 5 ending at or after \max(R_1, R_2) will have at least two zeros and at least two ones, and so be beautiful.
  • So, the longest non-beautiful subarray starting at index L, ends at \max(R_1, R_2) - 1.
    Thus, we can maximize the answer with \max(R_1, R_2) - L.

To quickly compute the values of R_1 and R_2, there are several different methods.
For example:

  1. Store a list of indices that contain zeros.
    Binary search on this list to find the first zero at an index \ge L, and then take the next element in the list to obtain R_1.
    Similarly store a list corresponding to ones and binsearch on that to compute R_2.
  2. Alternately, you can preprocess the string to store the next occurrence of a 0 and 1 after each index, and then look up these next occurrences twice to get R_1 and R_2.

TIME COMPLEXITY:

\mathcal{O}(N) or \mathcal{O}(N\log N) per testcase.

CODE:

Editorialist's code (C++)
// #include <bits/allocator.h>
// #pragma GCC optimize("O3,unroll-loops")
// #pragma GCC target("avx2,bmi,bmi2,lzcnt,popcnt")
#include "bits/stdc++.h"
using namespace std;
using ll = long long int;
mt19937_64 RNG(chrono::high_resolution_clock::now().time_since_epoch().count());

int main()
{
    ios::sync_with_stdio(false); cin.tie(0);

    int t; cin >> t;
    while (t--) {
        int n; cin >> n;
        string s; cin >> s;

        vector<array<int, 2>> prv(n);
        for (int i = 0; i < n; ++i) {
            prv[i] = {-1, -1};
            if (i > 0) {
                prv[i] = prv[i-1];
                prv[i][s[i-1] - '0'] = i-1;
            }
        }

        int ans = min(n, 3); // 3 is always possible
        for (int i = 3; i < n; ++i) {
            // Check length 4
            string tmp = s.substr(i-3, 4);
            if (tmp == "1010" or tmp == "0101") ans = max(ans, 4);

            // Check larger lengths
            int x = s[i] - '0';
            int p0 = prv[i][x];
            int p1 = prv[i][x^1];
            if (p1 >= 0 and p0 >= 0) {
                int p2 = prv[p1][x^1];
                // [p2+1...i]
                ans = max(ans, i-min(p0, p2));
            }
            else ans = max(ans, i+1);
        }
        cout << ans << '\n';
    }
}