P7BAR - Editorial

PROBLEM LINK:

Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4

Author: pols_agyi_pols
Tester: kingmessi
Editorialist: iceknight1093

DIFFICULTY:

Easy

PREREQUISITES:

None

PROBLEM:

You’re given an array A.
You can perform the following operation:

  • Choose an index i (1 \le i \le N) and an integer X \ge 0.
  • Then, for each 1 \le j \le i, replace A_j with A_j \oplus X.

Find the minimum number of operations needed to make all the elements of A distinct.

EXPLANATION:

Suppose A contains a pair of equal elements - say, A_i = A_j where i \lt j.
Observe that:

  1. Performing an operation on a prefix of length \lt i will leave both A_i and A_j unchanged; so they’ll still be equal.
  2. Performing an operation on a prefix of length \ge j will XOR both A_i and A_j by the same value; so they’ll still be equal.

So, the only way we can make them different from each other is by performing an operation on a prefix of some length between i and j-1.


We’ll use the above observation to solve the problem.

Consider the first duplicate element - that is, consider the smallest index j such that there exists an index i \lt j satisfying A_i = A_j.
As we noted earlier, there certainly must be an operation performed with an index in the range [i, j-1].
(If no such j exists, all the elements are distinct and the answer is 0.)

One nice thing is that we’re free to choose the value of X for the operation as we wish.
So, let’s just choose X = 2^{100} and perform the operation on the prefix of length j-1 (which is the largest prefix we can choose that ‘separates’ i and j.)

2^{100} is way larger than any existing element of the array, and also doesn’t share any bits with existing elements.
So, this effectively just adds 2^{100} to the first j-1 elements of the array.
While our goal was to ensure that A_i \ne A_j, we’ve now achieved something stronger: for any pair (x, y) such that x \le j-1 and y \ge j, we have A_x \ne A_y; because A_x will be \ge 2^{100} while A_y will be much smaller than it.

Essentially, we’ve separated out the entire prefix of length j-1 from the array!
It’s clear that this is optimal: we had to perform such a separation anyway (to fix the issue of A_i = A_j), and we’ve simply performed the largest possible separation we can.

Now we can simply solve the problem on the remaining part of the array (that is, the suffix starting from index j.)
Simply repeating this process over and over again till we reach the end of the array will give us the answer.

Note that it’s always possible to choose a valid X to perform the operations since we can just keep choosing larger and larger ones.
For example, choose X = 2^{200} on the second operation, then X = 2^{300} on the third operation, and so on.
This way, there will never be an interaction between the values changed by the operations, i.e. we won’t accidentally create new equal pairs.


A simple method to implement the above algorithm is as follows.

  • Let S be a set that stores the currently ‘active’ elements.
    Initially, S is empty.
  • Then, for each i from 1 to N in order:
    • If A_i is not present in S, insert A_i into S.
      This means there is no duplicate copy of A_i (yet).
    • If A_i is present in S, we’ve reached the first duplicate.
      Here, increment the answer by 1; then clear out S (since previous elements no longer matter, given that they’d have been affected by the operation we’re performing.)
      The algorithm must then restart from the current element; so after clearing S, insert A_i into it again and continue on.

This implementation is \mathcal{O}(N\log N), since we insert each element into S at most once (and so delete it at most once too.)

TIME COMPLEXITY:

\mathcal{O}(N\log N) per testcase.

CODE:

Editorialist's code (PyPy3)
for _ in range(int(input())):
    n = int(input())
    a = list(map(int, input().split()))
    
    seen = set()
    ans = 0
    for x in a:
        if x in seen:
            ans += 1
            seen = set([x])
        else:
            seen.add(x)
    print(ans)