CPYD - Editorial

PROBLEM LINK:

Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4

Author: pols_agyi_pols
Tester: kingmessi
Editorialist: iceknight1093

DIFFICULTY:

Easy

PREREQUISITES:

None

PROBLEM:

You have an array A. You can do the following:

  • Choose a non-empty subarray of A.
  • Keep the leftmost occurrence of each element in it, and delete everything else.
  • The cost of this operation is the number of distinct elements in A.

Find the minimum cost of making all the elements of A distinct.

EXPLANATION:

We’re unable to delete the leftmost occurrence of an element, so clearly the final array must consist exactly of all these leftmost occurrences.
Let l_x denote the position of the leftmost occurrence of x.

Further, note that if we ever perform two operations that intersect each other, it’s not worse to just combine them into a single one by taking the union of the segments: the set of deleted elements will not shrink, and the cost will not increase (since the cost is the number of distinct elements, and does not depend on the length).

With this in mind, let’s look at a single value x.
If x appears only once in A (or doesn’t appear at all) we ignore it for now.
If x appears \gt 1 time, we certainly need some operations to delete all the extra occurrences.

Let r_x denote the rightmost occurrence of x in the array. Note that if we choose the subarray [l_x, r_x] (or any larger subarray containing this one), every single extra occurrence of x will be deleted, and x will contribute only 1 to the overall cost.
That this is optimal to do is not hard to prove: to delete the occurrence at r_x, some previous occurrence is needed; to delete that previous occurrence another occurrence is needed, and so on till l_x is reached. But then these operations intersect, and so as noted at the start we can just take their union to obtain a not-weaker operation for not-larger cost; which means we end up including everything from l_x to r_x anyway.


So, for each x that has more than 1 occurrence (equivalently, l_x \lt r_x), we want to operate on a subarray that contains [l_x,r_x].
This gives us several candidate ranges for operations to perform.
Let’s take the union of these ranges so that we now have a pairwise disjoint set of candidate operations.

This set of operations is, in fact, optimal.

Proof

First, note that every integer x for which l_x \lt r_x appears in exactly one operation, and so contributes exactly one to the overall answer.
Clearly these cannot do any better, since at least one occurrence of x needs to be deleted in the first place.

Next, let’s look at elements x for which l_x = r_x, i.e, there’s only one occurrence.
For each of these elements, if they’re covered by one of the ranges they add 1 to the answer, otherwise they do not.
This is also necessary: if such an element is covered by a range, it means there exists some other element y which occurs both to the left and to the right of l_x; but in the end we must delete all occurrences of y other than the first - in particular meaning all occurrences of y to the right of l_x must go. This is only possible if we also choose an occurrence of y that’s to the left of l_x as well, which in turn means l_x itself will lie in the range and contribute 1 to the answer.

So, all elements that are forced to be selected have been selected, and nothing else has been selected. Hence, this must be optimal.

Finding the cost of each operation can simply be done in linear time, by just iterating through the range and adding its elements to a set, then adding the size of the set to the answer.
This works because the operations are disjoint and so the sum of their lengths won’t exceed N.

TIME COMPLEXITY:

\mathcal{O}(N \log N) per testcase.

CODE:

Editorialist's code (PyPy3)
for _ in range(int(input())):
    n = int(input())
    a = list(map(int, input().split()))
    first, last = [-1]*n, [-1]*n
    for i in range(n):
        if first[a[i] - 1] == -1: first[a[i] - 1] = i
        last[a[i] - 1] = i
    
    p = [0]*(n+1)
    for i in range(n):
        if first[i] != last[i]:
            p[first[i]] += 1
            p[last[i]+1] -= 1
    
    s = set()
    ans = 0
    for i in range(1, n):
        p[i] += p[i-1]
        if p[i] == 0:
            ans += len(s)
            s = set()
        else: s.add(a[i])
    ans += len(s)
    print(ans)
1 Like

I think in this case ans will be 1 not 2.
1
4
1121

here is my cpp code -

void solve() {
	int n;
	cin >> n;
	vi arr(n);
	mpii mpp;
	for(int i=0; i<n; i++) {
		cin >> arr[i];
	}
	int cnt = 1;
	ll ans = 0;
	for(int i=0; i<n-1; i++) {
		if(arr[i] == arr[i+1]) cnt++;
		else {
			if(cnt == 1 && mpp[arr[i]] == 1) ans++;
			else if(cnt == 1) {
				mpp[arr[i]] = 1;
			}
			else {
				ans++;
				mpp[arr[i]] = 1;
			}
			cnt = 1;
		}
	}
	if(arr[n-2] != arr[n-1] || cnt > 1) {
		if(cnt == 1 && mpp[arr[n-1]] == 1) ans++;
		else if(cnt == 1) {
			mpp[arr[n-1]]++;
		}
		else {
			ans++;
		}
	}
	cout << ans << "\n";
}

Good Problem. Really liked the concept.

No, the correct answer is 2. This is because to delete the rightmost 1, you will have to select l and r in such a way that 2 will also come in the chosen subarray.