MINCOL - Editorial

iceknight1093 · March 19, 2025, 4:45pm

PROBLEM LINK:

Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4

Author: raysh07
Tester: raysh07
Editorialist: iceknight1093

DIFFICULTY:

Cakewalk

PREREQUISITES:

Dynamic programming

PROBLEM:

You’re given an array A. You can perform the following operation:

Choose 1 \leq i \leq j \leq N such that A_i = A_j.
Then, for each index k \in [i, j], set A_k to A_i.

Find the minimum possible number of distinct elements in A after performing this operation several times.

EXPLANATION:

We’ll use dynamic programming.

Let dp_i denote the minimum number of distinct elements in the prefix of A of length i.
Note that any operation we perform cannot change A_i, so it will certainly remain in the final array.

There are now two possibilities:

We don’t operate on i at all. This gives us dp_{i-1} + 1.
We do operate on i.
Here, say we choose j \lt i and perform the operation on [j\ldots i].
Then the best we can do is dp_{j-1} + 1.

Note that both cases can be combined into just \min(dp_{j-1} + 1) across all j \leq i satisfying A_i = A_j.

One thing to take note of here is that it’s not immediately obvious why this is correct.
After all, we’re counting the number of distinct elements in A. So, is it not possible that we perform the operation on [j, i] (so the element A_i is present in the array), but then dp_{j-1} also counts the element A_i somewhere within it?
As it turns out, we don’t need to worry about such a thing at all.

Why?

Suppose we’ve performed some operations, and in the end there are two disjoint segments with the same color.
We could then choose one element from each of those segments, and color everything between them with this color. This will not increase the number of distinct colors (and might even reduce it) so it’s always optimal to do so.

Since we’re considering all possibilities for the operation to perform, the optimal solution will definitely be one among them - and for this optimal solution no overcounting will happen so the final value of dp_i will be correct.

We now have a correct algorithm, which is to just set dp_i = \min(dp_{j-1} + 1) across all j \leq i with A_i = A_j.
This, if implemented directly, will take quadratic time though and is too slow.

To optimize it, we can also store information by value.
That is, define \text{best}[x] to be the minimum value of dp_{i-1} across all indices i processed so far, such that A_i = x.
Then, to process index i, we can simply do the following:

First, set \text{best}[A_i] to \min(\text{best}[x], dp_{i-1}).
Then, set dp_i = \text{best}[A_i] + 1.

That’s it!
This now runs in linear time, and the answer is dp_N.
Note that A_i \leq N means you can use an array of length N to store \text{best}, instead of having to use a map.

TIME COMPLEXITY:

\mathcal{O}(N) per testcase.

CODE:

Author's code (C++)

#include <bits/stdc++.h>
using namespace std;
#define int long long
#define INF (int)1e18

mt19937_64 RNG(chrono::steady_clock::now().time_since_epoch().count());

void Solve() 
{
    int n; cin >> n;
    
    vector <int> a(n + 1);
    for (int i = 1; i <= n; i++){
        cin >> a[i];
    }
    
    vector <int> dp(n + 1, INF);
    dp[0] = 0;
    
    vector <int> e(n + 1, INF);
    for (int i = 1; i <= n; i++){
        // transport dp[i - 1] to e[a[i]] 
        e[a[i]] = min(e[a[i]], dp[i - 1]);
        
        // end something here 
        dp[i] = min(dp[i], e[a[i]] + 1);
    }
    
    cout << dp[n] << "\n";
}

int32_t main() 
{
    auto begin = std::chrono::high_resolution_clock::now();
    ios_base::sync_with_stdio(0);
    cin.tie(0);
    int t = 1;
    // freopen("in",  "r", stdin);
    // freopen("out", "w", stdout);
    
    cin >> t;
    for(int i = 1; i <= t; i++) 
    {
        //cout << "Case #" << i << ": ";
        Solve();
    }
    auto end = std::chrono::high_resolution_clock::now();
    auto elapsed = std::chrono::duration_cast<std::chrono::nanoseconds>(end - begin);
    cerr << "Time measured: " << elapsed.count() * 1e-9 << " seconds.\n"; 
    return 0;
}

Editorialist's code (PyPy3)

for _ in range(int(input())):
    n = int(input())
    a = list(map(int, input().split()))
    dp = [0]*n
    
    best = [n+1]*(n+1)
    for i in range(n):
        if i > 0:
            best[a[i]] = min(best[a[i]], dp[i-1])
        else:
            best[a[i]] = 0
        dp[i] = 1 + best[a[i]]
    print(dp[n-1])

theoneim31 · March 19, 2025, 5:24pm

CAKEWALK!! seriously? Need to level up ig

dhiva_07 · March 20, 2025, 5:09am

The difficulty tags may not be appropriate for all the users , after all some of the users have already spent hours on doing similar problems , yet not in all cases , take my example , i find it decently tough as well , also look at the difficulty rating that the problem has got … its 2050 (which is indeed high)… So don’t worry at all…

bhavdeepnijhaw · March 20, 2025, 12:43pm

First Testcase:
3
1 2 1

Output:
1 1 1

A_i changed from “2” to “1”. Where going wrong in the interpretation of this statement?

Thank you for the help.

dhiva_07 · March 20, 2025, 1:15pm

I also was thinking about what that statement really meant so I thought when is this statement even true , so I assumed that the author wanted to state that “we cannot change Ai where i = {1 , n}”. That is if the element we want to change is first element or the last element of the array , then we cannot do that whatsoever. But considering the fact that the author just mentioned about the prefix array of length i , I think he wanted to write A1 there , because the very next step explains the transitions for the entire array for all the indices i where i != 1 . So I think that has to be A1 there.

iceknight1093 · March 20, 2025, 5:43pm

Note that dp_i is defined only for the prefix of length i.
That means elements after the i-th aren’t even being considered: basically dp_i is the answer to the problem if we treat the first i elements of A as a separate array.

In the example you gave, [1, 2, 1],

dp_1 represents the answer for the array [1].
dp_2 represents the answer for the array [1, 2].
dp_3 represents the answer for the array [1, 2, 1].

When I say that it’s not possible to change A_i, note that I’m talking about this specifically in the context of computing dp_i - so A_i is essentially the last element of the array under consideration, which is why it can’t be changed (of course, A_1 also won’t change, but we aren’t using that fact here).