BEAUSUB - Editorial


Contest Division 3
Contest Division 2
Contest Division 1

Setter: Soumyadeep Pal
Tester: Aryan Choudhary
Editorialist: Hriday G




Dynamic Programming


Given an array A of length N and an integer K, find the length of the longest good subsequence of A. A sequence is said to be good if the number of adjacent elements which differ by value is at most K.


Any subsequence ending at position i is obtained by either starting a new subsequence at i, or by extending some subsequence ending at a previous position j (1 \leq j \lt i). Likewise, a subsequence which is k beautiful can be obtained by either starting a new subsequence at i, or by extending a k' beautiful subsequence ending at a previous position j (1 \leq j \lt i). Here k' is k if A_j = A_i, else k-1 if A_j \neq A_i.

Slow solution

Let’s try a Dynamic Programming approach where \mathrm{dp}[i][k] denotes the length of the longest k beautiful subsequence ending at the i-th element.

Trying to extend subsequences ending at each position j \lt i to include A_i, we get the transition \displaystyle \mathrm{dp}[i][k] = 1 + \max_{j < i} \mathrm{dp}[j][k'] where k' is k if A_j = A_i, else k-1 if A_j \neq A_i.

The answer will be the maximum value in the whole DP table. The initial states can be defined by filling the DP table with 0 s.

Computing the answer for each state requires \mathcal{O}(N) transitions, and there are \mathcal{O}(NK) states. Hence this solution has time complexity \mathcal{O}(N^2K), which is too high for the given constraints.

Reducing the transitions

Let’s try to reduce the number of transitions required per state from \mathcal{O}(N) to \mathcal{O}(1).

Notice that when we are computing the value of \mathrm{dp}[i][k], we only care about the maximum value \displaystyle \max_{j < i} \mathrm{dp}[j][k']. To be precise, we need two values \displaystyle \max_{\substack{j < i\\ A_j = A_i}} \mathrm{dp}[j][k] and \displaystyle \max_{\substack{j < i\\ A_j \neq A_i}} \mathrm{dp}[j][k-1].

If we can keep track of these maximum values, we can reduce the number of transitions per state to \mathcal{O}(1).

Keeping track of transitions of the first kind

Consider a sequence S of indices \displaystyle j_1 \lt j_2 \lt \ldots \lt j_m such that A_{j_1} = A_{j_2} = \ldots = A_{j_m}. For any k, it follows that \displaystyle \mathrm{dp}[j_1][k] \leq \mathrm{dp}[j_2][k] \leq \ldots \leq \mathrm{dp}[j_m][k]. In other words, \displaystyle \mathrm{dp}[j_m][k] = \max_{j \in S} \mathrm{dp}[j][k].


Consider a pair of indices j and j' \displaystyle (j \lt j', A_j = A_{j'}). Then \mathrm{dp}[j'][k] may be obtained by extending the sequence ending at j, using A_{j'} instead of A_{j}. This implies that \mathrm{dp}[j'][k] will always be equal to (if not more than) \mathrm{dp}[j][k].

Thus for transitions of this form, it suffices to try and update \mathrm{dp}[i][k] with (1 + \mathrm{dp}[last_i][k]), where last_i denotes the maximum j such that j \lt i and A_j = A_i.

Keeping track of transitions of the second kind

We can maintain a separate array, let’s call it \displaystyle \mathrm{pref}[i][k], which stores the length of the longest k-beautiful subsequence ending at some position j (j \leq i). In other words, \displaystyle \mathrm{pref}[i][k] = \max_{j \leq i} \mathrm{dp}[j][k] = \max(\mathrm{pref}[i-1][k], \mathrm{dp}[i][k]).

Then for transitions of this form, it will be enough to try and update \mathrm{dp}[i][k] with (1 + \mathrm{pref}[i-1][k-1]).

What if \mathrm{pref}[i-1][k-1] = \mathrm{dp}[j][k-1] such that A_j = A_i?

It does not matter

If a sequence is k-1 beautiful, then it is also k beautiful. The transition \mathrm{dp}[i][k] = 1 + \mathrm{pref}[i-1][k-1] uses this fact.

Putting all of it together, we now have a solution with time complexity \mathcal{O}(NK) and space complexity \mathcal{O}(NK). We can further reduce the space complexity to \mathcal{O}(N) by changing the order of iteration with careful updates, as demostrated in one of the solutions below.

A slightly different approach

Let’s define our dp states slightly differently. Let \mathrm{dp}[a][k] denote the length of the longest k beautiful subsequence ending with an element whose value is a. Initially all states are 0.

Now for each i and k (say a = A_i), we can either start a new subsequence at i and set \mathrm{dp}[a][k] = 1, or try to extend some existing subsequence (1 + \mathrm{dp}[a][k]) or \displaystyle (1 + \max_{\forall a' \neq a} \mathrm{dp}[a'][k-1]).

The second type of transition can be reduced to \mathcal{O}(1) in a way similar to the one described in the section above (maintain \mathrm{pref}[k]).

This solution also has time complexity \mathcal{O}(NK) and space complexity \mathcal{O}(NK). You may find the implementation in one of the solutions below.


\mathcal{O}(NK) per test case


Setter's Solution
using namespace std;

void solve() {
    int n, k; cin >> n >> k;
    vector<int> a(n);
    for (int i = 0; i < n; i++) {
        cin >> a[i];

    int maxi = *max_element(a.begin(), a.end()), ans = 0;
    vector<vector<int>> dp(n, vector<int>(k + 1, 1));
    vector<int> last_idx(maxi + 1, -1);
    vector<int> mx_ans(k + 1); // maximum answer upto (i-1) th index with j bad indices
    for (int i = 0; i < n; i++) {
        for (int j = k; j >= 0; j--) {
            if (j >= 1) {
                dp[i][j] = mx_ans[j - 1] + 1;
            if (last_idx[a[i]] != -1) {
                dp[i][j] = max(dp[i][j], dp[last_idx[a[i]]][j] + 1);
            mx_ans[j] = max(mx_ans[j], dp[i][j]);
            ans = max(ans, dp[i][j]);
        last_idx[a[i]] = i;
    cout << ans << '\n';

signed main() {
    ios_base :: sync_with_stdio(0); cin.tie(0); cout.tie(0);
    int t; cin >> t;
    while (t--) {
    return 0;
Tester's Solution
#!/usr/bin/env python
import os
import sys
from io import BytesIO, IOBase

def main():
    for _ in range(int(input())):
        a=[int(x)-1 for x in input().split()]
        dp=[[0]*(k+1) for _ in range(N)]
        for x in a:
            for i in range(k):

# region fastio

BUFSIZE = 8192

class FastIO(IOBase):
    newlines = 0

    def __init__(self, file):
        self._fd = file.fileno()
        self.buffer = BytesIO()
        self.writable = "x" in file.mode or "r" not in file.mode
        self.write = self.buffer.write if self.writable else None

    def read(self):
        while True:
            b =, max(os.fstat(self._fd).st_size, BUFSIZE))
            if not b:
            ptr = self.buffer.tell()
  , 2), self.buffer.write(b),
        self.newlines = 0

    def readline(self):
        while self.newlines == 0:
            b =, max(os.fstat(self._fd).st_size, BUFSIZE))
            self.newlines = b.count(b"\n") + (not b)
            ptr = self.buffer.tell()
  , 2), self.buffer.write(b),
        self.newlines -= 1
        return self.buffer.readline()

    def flush(self):
        if self.writable:
            os.write(self._fd, self.buffer.getvalue())

class IOWrapper(IOBase):
    def __init__(self, file):
        self.buffer = FastIO(file)
        self.flush = self.buffer.flush
        self.writable = self.buffer.writable
        self.write = lambda s: self.buffer.write(s.encode("ascii")) = lambda:"ascii")
        self.readline = lambda: self.buffer.readline().decode("ascii")

sys.stdin, sys.stdout = IOWrapper(sys.stdin), IOWrapper(sys.stdout)
input = lambda: sys.stdin.readline().rstrip("\r\n")

# endregion

if __name__ == "__main__":
Editorialist's Solution
 🍪 the_hyp0cr1t3
 🍪 28.07.2021 00:37:21
#ifdef W
    #include "k_II.h"
    #include <bits/stdc++.h>
    using namespace std;
#define pb emplace_back
#define all(x) x.begin(), x.end()
#define sz(x) static_cast<int32_t>(x.size())
auto chmax = [](auto& A, auto&& B) { return B > A? A = B, true : false; };
auto chmin = [](auto& A, auto&& B) { return B < A? A = B, true : false; };

const int64_t k_II = 2e18;
const int INF = 2e9, MOD = 1e9+7;
const int N = 1000 + 5;

int main() {
    ios_base::sync_with_stdio(false), cin.tie(nullptr);
    int Q; cin >> Q;

    while(Q--) []() {
        int i, n, k;
        cin >> n >> k; ++k;
        vector<int> a(n);
        for(auto& x: a) cin >> x;

        vector<int> dp(n + 1), cur(N, n), nxt(n);
        for(i = n-1; ~i; i--)
            nxt[i] = cur[a[i]], cur[a[i]] = i;

        while(k--) {
            for(i = n; i; i--)
                chmax(dp[i], dp[i-1] + 1);
            for(i = 0; i < n; i++) {
                if(nxt[i] < n)
                    chmax(dp[nxt[i] + 1], dp[i + 1] + 1);
                chmax(dp[i + 1], dp[i]);

        cout << *max_element(all(dp)) << '\n';

} // ~W

Feel free to share your approach. Suggestions are welcomed.


Why can’t codechef editorial be like how leetcode editorial are :roll_eyes::neutral_face:


Why did O(NK \log{N}) solutions pass when there is no constraint on the sum of N?


How can even O(N*K) pass? There were 100 test cases and there was nothing written like “Sum of N over all test cases can’t exceed 1e3”. Over 100 cases in worst case around 1e8 operations would be there and it would TLE


Can we have a recursive + memoization solution?


1e8 should pass in one second, but O(NK \log{K}) is 1e9 which obviously shouldn’t.

Can you provide the links to more questions like this?


I tried a recursive + memoization solution but got TLE. Here is the link: Solution: 49208630 | CodeChef


i tried so hard and get sofar BUT in the end there was no PARTIAL POINTS


I have been getting runtime error on my submission . Can anybody getting similar problem explain the reason of getting RE.

Did the same thing described in the editorial but got TLE. Why so tight constraints? Correct submissions are failing.
My submission

1 Like

Yeah I had the same problem using arrays but it was resolved when I used vectors. Don’t know how though.

The limit is around 4e8 afaik.

This solution has undefined behaviour on the sample testcase:

Compiling enigma20-BEAUSUB.cpp
+ g++ -std=c++14 enigma20-BEAUSUB.cpp -O3 -g3 -Wall -Wextra -Wconversion -DONLINE_JUDGE -D_GLIBCXX_DEBUG -fsanitize=undefined -ftrapv -fno-sanitize-recover
enigma20-BEAUSUB.cpp:4:0: warning: ignoring #pragma GCC optimization [-Wunknown-pragmas]
 #pragma GCC optimization("O3")
enigma20-BEAUSUB.cpp:5:0: warning: ignoring #pragma GCC optimization [-Wunknown-pragmas]
 #pragma GCC optimization("unroll-loops")
enigma20-BEAUSUB.cpp: In function ‘void answer_nikaal()’:
enigma20-BEAUSUB.cpp:93:16: warning: unused variable ‘x’ [-Wunused-variable]
     ll n,k,i,j,x;
enigma20-BEAUSUB.cpp: In function ‘int main()’:
enigma20-BEAUSUB.cpp:124:29: warning: format ‘%lld’ expects argument of type ‘long long int*’, but argument 2 has type ‘int*’ [-Wformat=]
enigma20-BEAUSUB.cpp: In function ‘void answer_nikaal()’:
enigma20-BEAUSUB.cpp:94:10: warning: ignoring return value of ‘int scanf(const char*, ...)’, declared with attribute warn_unused_result [-Wunused-result]
enigma20-BEAUSUB.cpp:97:14: warning: ignoring return value of ‘int scanf(const char*, ...)’, declared with attribute warn_unused_result [-Wunused-result]
enigma20-BEAUSUB.cpp: In function ‘int main()’:
enigma20-BEAUSUB.cpp:124:10: warning: ignoring return value of ‘int scanf(const char*, ...)’, declared with attribute warn_unused_result [-Wunused-result]
+ set +x
[~/devel/hackerrank/otherpeoples]>echo "4                                                       
4 1
1 1 2 3
9 2
1 2 3 2 4 5 6 7 5
5 5
1 1 1 1 1
10 1
1 2 1 2 1 2 1 2 1 2
" | ./a.out
*** stack smashing detected ***: <unknown> terminated
Aborted (core dumped)

The warning about line 124 is important.

1 Like

My O(NKlogN) solution passes without doing any optimizations. I don’t think it was intended to pass, but yeah it did.

ll c[n],dp[n][k+1];

Tried fixing that but still getting TLE :frowning:


how come O(N * K * T) pass if N=1000, K=1000, T=100.
in one sec we can perform 1e8 operation right??
Solution: 49219058 | CodeChef this passed within 0.14s how???
is there anything which i am missing


See, I have seen several times ~1e8 operations getting TLE, and sometimes just avoids TLE. This is because actual time taken also depends on several factors like “cost of each operation”. The thing is 1e8 is borderline. They shouldn’t give such tight constraints, and I’m sure due to this many peeps having the correct O(NK) solution didn’t implement (including me). But upon seeing O(NKlogN) solution passing easily, it seems the problem forgot to mention something like sum of N*K over all testcases doesn’t exceed 1e6 or maybe there were weak test cases

1 Like

can anyone provide brute force solution NNK solution (just for learning)
i have did this but this is wrong on sample cases too

    int n,K; cin>>n>>K;
    vector<int> a(n,0);
    for(int i=0; i<n; i++){
    vector<vector<int>> dp(n+1,vector<int>(K+1,0));
    for(int i=1; i<=n; i++){
        for(int j=1; j<i; j++){
            for(int k=1; k<=K; k++){
                if(a[i-1] == a[j-1]){
                    dp[i][k] = 1 + dp[i-1][k];
                    dp[i][k] = 1 + dp[i-1][k-1];

1 Like