MAXHAMDIST - Editorial

iceknight1093 · September 18, 2024, 4:30pm

PROBLEM LINK:

Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4

Author: klu_2100031675
Tester: tabr
Editorialist: iceknight1093

DIFFICULTY:

TBD

PREREQUISITES:

None

PROBLEM:

You are given N strings of length M, all containing 0, 1, or ?.
The question marks can be replaced with either 0 or 1.
Find the maximum possible sum of Hamming distances between all pairs of the N strings, after replacement.

EXPLANATION:

Let’s try to solve a simple version of the problem first: what if M = 1, i.e, every string had length 1?

After replacing the ? with zeros and ones, suppose we end up with x zeros and y ones.
Our goal is to maximize the sum of Hamming distances, meaning each pair of strings will be compared against each other.
Note that:

Each pair of (0, 1) will add 1 to the answer.
Each pair of (0, 0) and (1, 1) will add 0 to the answer.

So, the sum of all Hamming distances is just the number of pairs of strings with different values: namely, x\cdot y.

There are a couple of ways to maximize this quantity:

Simply use brute force.
Since x+y = N, fixing x also fixes y.
Try every value of x, compute the corresponding value of y, and take the maximum value of (x\cdot y) among them.
Note that you will need to do a couple of sanity checks: x cannot be smaller than the number of already existing zeros, and y cannot be smaller than the number of existing ones; everything else is ok.
Since x+y = N, the product x\cdot y is maximized when x and y are as close to each other as possible.
So, while there exists a ? still, you can greedily increase the minimum of x and y by 1 (meaning, if there are less zeros we turn one ? into a 0, otherwise we turn one ? into a 1).

Both methods are valid, and take \mathcal{O}(N) time each.
The second can in fact be done in \mathcal{O}(1) using a bit of casework, but you need \mathcal{O}(N) time to find x and y anyway so it isn’t actually a speedup.

Now, let’s move to the more general version, with arbitrary M.
Observe that the Hamming distance is essentially independent on each column: meaning, what decisions we make on the first column don’t affect any of the other columns; our decisions on the second column don’t affect the others, and so on.

In other words, each column is an independent problem with M = 1.
We already know how to solve that!
Simply applying the solution of M = 1 to every column, and summing up the answers, gives us a solution in \mathcal{O}(N\cdot M) time (\mathcal{O}(N) per column) which is fast enough.

TIME COMPLEXITY:

\mathcal{O}(N\cdot M) per testcase.

CODE:

Editorialist's code (Python)

for _ in range(int(input())):
    n, m = map(int, input().split())
    s = [input() for i in range(m)]
    ans = 0
    for i in range(n):
        x, y = 0, 0
        for j in range(m):
            x += s[j][i] == '0'
            y += s[j][i] == '1'
        while x+y < m:
            if x < y: x += 1
            else: y += 1
        ans += x*y
    print(ans)

aupo14 · September 18, 2024, 5:41pm

CodeChef: Practical coding for everyone why this is giving TLE, all the test cases passed not the last one

rds98 · September 18, 2024, 8:24pm

Most likely your input is slow try optimizing this:

vector<string>v[m];
for(int i = 0;i<m;i++) {
  string s;
  cin>>s;
  v[i].push_back(s);
}

iceknight1093 · September 19, 2024, 5:54am

Looks like it’s because of the use of endl.
endl is really slow because it flushes the buffer every time it’s called (which has the additional effect of effectively disabling your fast input).
Just use '\n' instead.

aupo14 · September 19, 2024, 12:59pm

Thanks a lot…@iceknight1093