PROBLEM LINK:
Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4
Author: raysh07
Tester: iceknight1093
Editorialist: iceknight1093
DIFFICULTY:
Simple
PREREQUISITES:
Sets, greedy
PROBLEM:
A sequence B of even length is called good if B_{2i-1} = B_{2i} for each valid i.
Given an array A, find the length of its longest good subsequence.
EXPLANATION:
Our goal is essentially to find as many disjoint pairs of equal elements that we can, where two pairs of indices (i_1, i_2) and (j_1, j_2) are called disjoint if i_2 \lt j_1 or j_2 \lt i_1.
It turns out that a simple greedy algorithm works to do this.
Let r_1 denote the smallest index with a repeated element.
That is, all the elements A_1, A_2, \ldots, A_{r_1 - 1} are distinct, whereas A_{r_1} is equal to one of the elements before it; say l_1 \lt r_1 is such that A_{l_1} = A_{r_1}.
Then, there exists an optimal solution where we take the pair (l_1, r_1) into the subsequence.
Proof
The proof is via a simple exchange argument: suppose we don’t take this pair, and let the first pair we choose be (x_1, y_1) instead.
Then certainly we must have y_1 \gt r_1, since we know all elements before r_1 are distinct.
But in that case, we could just replace (x_1, y_1) with (l_1, r_1) instead, since it won’t conflict with any future pairs chosen either while the number of chosen pairs remains the same.
Applying this exchange to an optimal solution will thus give us another optimal solution that does have (l_1, r_1) as desired.
Once (l_1, r_1) has been chosen, any future pairs surely cannot involve indices \le r_1.
This means we’re basically starting fresh from index r_1 + 1.
However, the exact same logic applies now too: let r_2 \gt r_1 be the first index such that the values at indices r_1+1, r_1+2, \ldots, r_2-1 are all distinct while r_2 contains a duplicate element (duplicate only among indices after r_1, of course.)
Then it’s optimal to choose r_2 with the corresponding index l_2 before it as our second pair.
The same applies to the third pair after r_2, and so on.
We now need to implement this algorithm quickly.
One way of doing that easily is to use a set, as follows.
- Let S be a set, initially empty.
- For i = 1, 2, 3, \ldots, N, do the following:
- If A_i \in S, add 2 to the answer and set S = \{\ \}, i.e. clear all elements from S.
This corresponds to finding a matching pair of elements; so we pair them and start afresh by discarding all existing elements. - If A_i \not\in S, then insert A_i into S and continue on.
- If A_i \in S, add 2 to the answer and set S = \{\ \}, i.e. clear all elements from S.
Each element is inserted into S at most once, and so deleted from S also at most once.
Thus there are \mathcal{O}(N) set operations, giving a complexity of \mathcal{O}(N\log N).
TIME COMPLEXITY:
\mathcal{O}(N\log N) per testcase.
CODE:
Editorialist's code (PyPy3)
for _ in range(int(input())):
n = int(input())
a = list(map(int, input().split()))
s = set()
ans = 0
for x in a:
if x in s:
ans += 2
s = set()
else:
s.add(x)
print(ans)