×

# KCOMPRES - Editorial

Practice

Contest

Author: Bhuvnesh Jain

Tester: Mark Mikhno

Editorialist: Bhuvnesh Jain

MEDIUM

# Prerequisites

Binary Search, Greedy, Segment trees.

# Problem

You are given a sequence of integers $A_1, A_2, \dots, A_N$. For an integer $K$, let's define a $K$-compressed sequence $B_1, B_2, \dots, B_N$ as follows:

• for each valid $i$, $B_i$ is a positive integer.
• for each valid $i$, if there are exactly $X$ numbers smaller than or equal to $A_i$ in the subsequence $A_{\mathop{max}(1, i-K)}, \dots, A_{\mathop{min}(N, i+K)}$, then there must be exactly $X$ numbers smaller than or equal to $B_i$ in the subsequence $B_{\mathop{max}(1, i-K)}, \dots, B_{\mathop{min}(N, i+K)}$.
• $B_1 + B_2 + \dots + B_N$ is minimum possible.

For a given integer $S$, find the number of values of $K$ ($0 \le K \le N$) such that the sum of elements of the $K$-compressed sequence does not exceed $S$.

# Explanation

Let us first understand how to construct a $K-compressed$ array $B$ for given array $A$ and value $K$. For this, we will iterate over the numbers in increasing order and try to assign the smallest number not yet assigned to any number smaller than it until now. This will be optimal as we try to assign the smallest possible number to each number and iterating in increasing order ensures that if the number assigned to a given number can't be decreased further, the sum can't be minimised further. This proves our greedy strategy. Let me explain it through an example, where $A = [4 2 8 1 4 3 8 1]$ and $K = 3$.

Let the current array $B$ be $[-, -, -, -, -, -, -, -]$. We now group the numbers having the same value and iterate in increasing order. The following are the values of the array $B$ after each step:

1. $[-, -, -, 1, -, -, -, 1]$. Since both $1$ are the first number to be considered, we assign them the smallest possible number in $B$, i.e. $1$.

2. $[-, 2, -, 1, -, -, -, 1]$. Since, index $2$ has number $1$ less than it in range $[max(1, 2-3), min(8, 2+3)] = min[1, 5]$. We assign it the next biggest number.

3. $[-, 2, -, 1, -, 2, -, 1]$. Since, index $6$ has number $1$ less than it in range $[3, 8]$, we assign it next highest number not present in this range i.e. $2$. Note that though $A[2] < A[6]$, we can still have $B[2] = B[6]$ as there ranges do not coincide to give a conflict.

4. $[3, 2, -, 1, 3, 2, -, 1]$. For both index $1$ and $5$, we have it as the third largest number in their respective range. So, we assign them the next biggest number which was not used.

5. $[3, 2, 4, 1, 3, 2, 4, 1]$. This is similar to the above process.

Thus, the minimum possible sum is $(3+2+4+1+3+2+4+1) = 20$. Make sure you are clear with the idea before you proceed further.

Now there are some issues which might occur while implementing the above approach. The first thing is that we should deal with all the numbers having the same value together should be dealt together instead of simply iterating in increasing order. One simple counterexample for this is the array $A = [10, 30, 30, 20, 10]$ and $K = 1$. The compressed array should be $B = [1, 3, 3, 2, 1]$ itself but if we simply iterate over numbers in increasing order, we might end up getting array $B$ as $[1, 2, 3, 2, 1]$ or $[1, 2, 2, 2, 1]$ which is again highly dependent on your implementation.

The correct logic is to first group the numbers by their values. Find what value you might end up giving them based on their ranges. Then, we need to be sure that the value we might give is correct or now. For this, we again iterate over the numbers and group them if their ranges coincide with each other. We assign it the all the numbers in the group the largest number we thought of assigning to any number in the range. For the example $A = [10, 30, 30, 20, 10]$ and $K = 1$, the process is as follows:

1. The initial array is $[-, -, -, -, -]$.

2. $[1, -, -, -, 1]$. We group $10$ first. The ranges for them are $[1, 2]$ and $[4, 5]$. Since they do not conflict, we assign them separately the minimum number to them. Both of them end up getting $1$.

3. $[1, -, -, 2, 1]$. Since there is only one $20$, we simply assign it the smallest number in the range $[3, 5]$ which is not assigned yet, i.e. $2$.

4. $[1, 3, 3, 2, 1]$. The ranges for $30$ are $[1, 3]$ and $[2, 4]$. Since they both coincide, we will assign them value together. The value we might end up giving $A[2]$ is $2$ since it is the smallest which is not used till now in the range $[1, 3]$ in $B$. The value we might end up giving $A[3]$ is $3$, which is the smallest not used till now in the range $[2, 4]$ for $B$. Since the ranges coincide, we should give each of them $3$, the maximum of the number we might think of assigning them.

Thus, we can easily build the $K-compressed$ algorithm using the above ideas. But how fast can we do it?

Doing it naively will take $O(N^2)$ as it will require you to find the smallest number not assigned yet in a range, taking $O(N)$ for this step alone. But if you restate this problem, it is similar to the following $2$ operations:

• Update the number at given index.
• Find the largest number in a range.

This is a very familiar problem which can be solved using segment trees in $O(\log{N})$ for each operation. You can read about it here.

Thus, we can built a $K-compressed$ array for given array $A$ and $K$ in $O(N * \log{N})$ complexity.

To find what possible values of $K$ will lead to compressed array having sum less than $S$, we can simply iterate over all possible values of $K$ and update the answer. This approach has a complexity of $O(N * N * \log{N})$ which is enough to pass the first $2$ subtasks.

The next thing to note that we can binary search on the answer. To prove this, we need to prove that the minimum sum for $K-compressed$ array does not decrease with increasing $K$. This again relies on the way we build our $K-compressed$ array using a greeedy algorithm. Since we make sure that each number is compressed to the smallest possible value and the sum can't decrease with increasing $K$ as the smallest number which might get assigned to a number can only increase if it's range (or window) increases.

Thus, the overall complexity of the approach reduces to $O(N * {\log}^2{N})$. This is enough to pass all the subtasks as well.

Once, you are clear with the above idea, you can see the author implementation below for help.

Feel free to share your approach as well, if it was somewhat different.

## Note from Author:

The test case data was bit weak for the small subtasks where some wrong greedy approaches passed as well. Though the large test case ensured, wrong solutions could not pass for the full solution, but still, I failed in generating stronger test data for smaller ones. This is not completely our fault as there are many ways the greedy solution for this compression could be written where even index errors could happen which implementing. I would like to thank all the people who testing their solutions as well and helped me strengthen the test as well (Special thanks to Stepan). But still, we alone cannot come up with all possible greedy solution which one might write, so stronger test could not be prepared by me. By the time, I came to know about this it was already to late for a rejudge, but I will take care of it in the future problem.

Also, the problem statement seemed quite tough to comprehend for most of the contestants as well. User acmonster even pointed out a flaw in the English statement as well. Below were his comments:

"I guess that the problem (and test data) actually requires that sequence B should preserve all the relative sizes for each index pair (i, j) such that |i - j| <= K. In other words, if A[i] < A[j], we should have B[i] < B[j]; if A[i] = A[j], we should have B[i] = B[j], etc. If this is correct, the fix does not seem to be sufficient. Consider A = (1, 1, 1, 2, 3), B = (2, 1, 1, 1, 2), and K = 2: both sequences has a "signature" of (2, 2, 2, 2, 2), but, again, the optimal sequence B does not preserve A[1] = A[2] = A[3] < A[4] < A[5]."

Though most of the contestants got what the problem statement meant, I will make sure to make better statements in future as well.

# Time Complexity

$O(N * {\log}^2{N})$

# Space Complexity

$O(N)$

# SOLUTIONS:

Author's solution can be found here.

Tester's solution can be found here.

Editorialist's solution can be found here.

This question is marked "community wiki".

6★likecs
3.7k2481
accept rate: 9%

19.8k350498541

That thing with balancing the similar numbers took too much time for me to solve finally and come above all those TLEs and WAs, because of some strong test cases in the last subtask. But, really a great question in the first place. Worth it!

(13 Aug '18, 16:34)

 3 I think that the solution is incorrect, since the cost of the K-compressed sequence is not monotonic in K. Here is a simple counter-example. Consider sequence A = (2, 1, 2, 3, 4). One can verify that one of its 1-compressed sequences is (2, 1, 2, 3, 4) itself, which has a cost of 12. However, one of its 2-compressed sequence is (3, 1, 2, 2, 3), which, in fact, has a lower cost of 11. answered 16 Aug '18, 13:44 31●2 accept rate: 0% Also note that in the K = 2 case, though we have A[1] = A[3] in the original sequence, we can achieve a smaller sum only if we choose to assign different values to B[1] and B[3]. This shows that the greedy algorithm mentioned in the solution is not optimal. (16 Aug '18, 13:50) Hey @acmonster , if we assign $B_1$ and $B_3$ different values, then doesn't point $2$ of the question get violated? It says that if there are $X$ elements smaller than $A_i$ in that range, then this must be same for $B_i$ as well. For your sequence, there is $1$ element smaller than $A_1$ in range $[1,3]$, but $2$ elements smaller than $B_1$ in range $[1,3]$ which I feel make your sequence invalid. Can you please have a look? :) (16 Aug '18, 18:13) Hi @vijju123, the problem statement was updated and it now counts the number of numbers "smaller than or equal to" A[i] and B[i]. (16 Aug '18, 18:40) @vijju123 Moreover, under the "smaller than" version of the problem that you referred to, we can still construct a counter-example of A = (3, 4, 3, 2, 1), whose 1-compressed sequence is (1, 4, 3, 2, 1) (with sum 11), and whose 2-compressed sequence is (1, 3, 2, 2, 1) (with a smaller sum of 9). (16 Aug '18, 18:44) Oh! I missed the statement update. Sorry, and thanks for telling that :). I will forward your concern to @mgch to have a look. (16 Aug '18, 19:10) 1 Why ?? I statement is update. But there is no announcement. Rule for problem setter should be made. I he wants to add a "." also. Then he will be allowed to do so only after making an announcement on contest page. Btw when this update was made ?? (17 Aug '18, 13:37) My comment on Magic Set July'18 which nobody bothered to reply - "Test Case 1 - Subsequences of [1,2] are [1],[2],[1,2] sym of all these is 6. (Divisible by 3). So answer should be 1? Is answer 0 correct?" "When was "each" added in problem statement? And why is no announcement made after this change?" . I remember reading statement multiple times because I got intution that this is what statement is asking. But each was not present there. (17 Aug '18, 13:41) showing 5 of 7 show all
 2 Managed to optimize my brute-force approach and solve it without segment tree. Time and space complexity remains the same as that of this editorial. Binary search was the key in solving this problem. My solution: https://www.codechef.com/viewsolution/19673140 answered 15 Aug '18, 00:18 4★faizz7 31●2 accept rate: 0% 2 Thanks for letting me know this. (16 Aug '18, 22:14) likecs6★
 2 I doesn't look to me like the described greedy solution works. I also had the intuition that something like that would work (e.g. "all equal values in A that are within each other's range will be the same in B") but I managed to disprove all of these intuitions that I had. For example, for A = [1, 1, 2, 2] and K = 1, the setter's solution computes B = [1, 1, 2, 2], but in fact, B = [2, 1, 1, 1] with a smaller sum of 5. So I wonder if someone can explain a correct algorithm to compute the K-compressed sequence in O(nlgn) time. answered 18 Aug '18, 13:43 5★buda 21●3 accept rate: 0% @buda It is already stated in editorial that english version of statement is wrong and thus the editorial apporach too. My intended task was exactly the same as @acmonster version of the statement, mentioned in editorial but it seems there was a huge fault here from my side. (19 Aug '18, 19:45) likecs6★
 1 https://www.codechef.com/viewsolution/19654812 Can anyone please help me.My solution passed all the solution except the last one which is giving WA. @include_sajal my solution was also giving TLE and WA first.I got rid of the TLE but there is still one WA.Could please tell what approach of handling those testcase. answered 13 Aug '18, 16:44 11●1 accept rate: 0% Please tell me your B array for the input 1 20 2 3 6 7 5 3 5 6 2 9 1 2 7 0 9 3 6 0 6 2 6 (14 Aug '18, 17:27) at what value of k (15 Aug '18, 03:38) Oh yeah, sorry, for k=9. (16 Aug '18, 01:04)
 1 I was able to solve all test cases during the competition with a completely different approach. The problem reminded me of software dependencies and I started looking at directed acyclic graphs and found that they fit this application fairly well. I sorted the list elements into a topological order where repeated elements were all given the same value. This is definitely similar to your grouping step in the editorial solution. For k = 0, each compressed element is given a value of 1. I then iterated through the values of k from 1 to N. For each incremental value of k, and each list element, there are at most 2 new "dependencies." For example, at k = 0, each element must be greater than or equal to at most 0 elements. At k = 1, each element must be greater than or equal to at most 2 elements. At k = 2, each element must be greater than or equal to at most 4 elements, etc. ... After each iteration of k, I found the max of the dependencies and incremented or equivalated as necessary. I did have an issue with solving the test cases for the largest values of N within the time constraints, but this was solved by recognizing that if S > ((N * (N + 1) / 2 ), then the result should be N + 1. That was my basic approach, but there was some optimization code to remove unnecessary dependencies, ones which would be resolved by other dependencies with a higher topological order. This was definitely a tough one, the comments section was full of frustrated and angry people, I was among them for a long time. answered 15 Aug '18, 08:21 22●2 accept rate: 0% I've also observed this but couldn't come out with a solution during the contest... sure it's another way to solve this problem. (15 Aug '18, 08:35) I came up with this approach as well. What did you do to handle cases like [10, 30, 30, 20, 10], K = 1? Also, the expected complexity for this in my case was O(n^2 * lg(n)), since I was rebuilding my DAG for each K. Did you pass the last subtask with just the optimization on the value of S? (15 Aug '18, 23:11) rashomon4★ I retained the DAG for each incremental value of k and then just added the newest 0, 1 or 2 dependencies. Maintaining the dependencency list and eliminating the non-essential dependencies was key to getting this solution to work within the time constraints. I may have oversimplified my explanation a bit. For each list element I had what I called a "dependency" list, where the element in question must be greater than any element in the dependency list; and also an equivalency list, where the element in question must be greater than or equal to every element in the equivalency list. (16 Aug '18, 07:03) This approach mandates that k is iterated through from 1 to N, so the binary search approach isn't possible here. (16 Aug '18, 07:05) I came up with this kind of solution, too, but got some WAs for some unknown reason. Since I was using naive data structures in Haskell, I also got TLEs. (19 Aug '18, 06:50)
 0 Can anyone please explain how O(N∗N∗logN) transformed into O(N∗log2N) using binary search?? answered 14 Aug '18, 10:15 1 accept rate: 0% 1 Linear time is taken to traverse for each k, which can be reduced to logarithmic time if we use binary search for maximum k upto which sum<=s hold.. thus reducing one N to logN (15 Aug '18, 03:41)
 0 hrishabh0901 instead of iterating over increasing values of K,you can do bsearch as with increasing k,your sum would also be non decreasing..... answered 14 Aug '18, 11:24 21●2 accept rate: 0%
 0 Can anyone please explain the fault in my code. I coded as per the editorial, using only the greedy approach (No Segment Tree) and getting AC on 6 subtasks out of 10 but no complete AC in any task. @likecs it would be great if you could kindly help out. My Code answered 14 Aug '18, 16:39 36●4 accept rate: 6% There is a flaw in calculating colliding interval. You have considered that it can collide only once but there can be multiple places where it might collide. Eg : [1, 1, 1, 1, 1] and k = 2. All ranges are colliding but you don't consider such cases. (Note, your solution will be correct for the case I provided, it is just to give you intuition where it might fail). (19 Aug '18, 19:50) likecs6★
 0 Please help me to find counter test case of my code of the problem KCOMPRES. Only two task remaining. My solution included segment tree approach. My Solution answered 14 Aug '18, 17:37 11●2 accept rate: 0% You might want to change your custom comparator function. Particularly, if (a.first == b.first). Because of it, your program might have been entering in an infinite loop and due to which you got TLE even for small input. Its behaviour is unpredictable so probably this is the reason you got another WA. Change it and try running it again and see if you can get all your test cases passed. See here for more on this: https://stackoverflow.com/questions/18291620/why-will-stdsort-crash-if-the-comparison-function-is-not-as-operator Google more to find out about it. (15 Aug '18, 00:06) I Changed, But didn't work :( Also, Thank you for your time which you give it. (15 Aug '18, 16:04)
 0 can someone explain the k-compress part of the question...thanks in advance "for each valid i, if there are exactly X numbers smaller than or equal to Ai in the subsequence Amax(1,i−K),…,Amin(N,i+K), then there must be exactly X numbers smaller than or equal to Bi in the subsequence Bmax(1,i−K),…,Bmin(N,i+K)." answered 14 Aug '18, 22:52 3★rohitp12 2 accept rate: 0%
 0 @acmonster I didn't get what you are trying to say...we want to find the k compressed sequence with lowest sum so 1-compressed sequence with minimum sum for {2 1 2 3 4} is {1 1 1 1 1} with minimum sum of 5 while 2 compressed sequence with minimum sum is {3 1 2 2 3} with sum 11. Why are u comparing any random k compressed sequence? answered 16 Aug '18, 20:18 127●1●9 accept rate: 0% @acmonster ohh sorry!... {1 1 1 1 1} is 0 compressed sequence...yes you are right, then we cannot use binary search then. (17 Aug '18, 07:20)
 0 Hi @byomkeshbakshy, you might want to check that the 1-compressed sequence with minimum sum for (2, 1, 2, 3, 4) is (2, 1, 2, 3, 4) itself, not (1, 1, 1, 1, 1). This shows that the minimum sum of K-compressed sequence is not necessarily non-decreasing in K. answered 17 Aug '18, 06:28 31●2 accept rate: 0%
 0 @likecs Please help me with this solution... What can be test cases It may be failing upon? https://www.codechef.com/viewsolution/19759987 answered 17 Aug '18, 11:27 65●7 accept rate: 8%
 0 FWIW, I also thought binary search would be useful here upon seeing the problem, but then wrote a brute-force solver and found counterexamples. One's intuition can be misleading. answered 17 Aug '18, 22:21 5★buda 21●3 accept rate: 0%
 0 In the editorial, it is written to check whether the ranges of the identical elements coincide with each other, but it does not give the criteria for coinciding ranges. Example: For a = [2, 3, 4, 3] and k = 1, the range of the 3 at index 2(assuming 1 based indexing) is [1, 3] and that for the 3 at index 4 is [3, 4]. But both of these can be mapped to different values. So, to check whether the ranges for 2 indices i and j (i < j) for a given value of k overlap or not, we need to check j - i <= k, and not i + k <= j - k. In the example above, the optimal compression for k = 2 is [1, 2, 3, 1], while using the wrong condition it comes out to be [1, 2, 3, 2]. My solution with the wrong condition passed all but the last test file. answered 18 Aug '18, 12:09 71●3 accept rate: 13% It is already mentioned in editorial that small testset was weak. (19 Aug '18, 19:43) likecs6★
 0 @admin, @likecs, is anything going to happen here? I do not see how this: I guess that the problem (and test data) actually requires that sequence B should preserve all the relative sizes for each index pair can be derived from problem statement... Moreover, I bet, during AUG18 most people who submitted greedy algo and got AC - they didn't question their own solution(they got AC, you know) and just moved on to next problems. I just picked and tested some of AC solutions from top 30 aug18 finishers, all they look wrong and fail on these 'counterexamples' - you can easily check it yourselves using following input: 2 4 6 3 2 1 1 6 12 4 3 1 2 1 2  and if you get following response from any solution: 1 2  it is wrong answer. I don't think how this whole situation is ok. Besides to @acmonster and @buda have already pointed to flaws in editorial solution. I as well developed greedy algorithm at first during the contest but on first WA(bug in implementation), after more thorough thinking, I discovered similar 'counterexamples' myself. So that essentially made problem harder obviously and I 'postponed' it(and it turned out I never returned back to it during the rest of contest). IMHO, either this problem should be removed from every contestant's score or entire aug18-long should be unrated. Because It feels like bad precedent. Am I missing any specific rules for such cases? PS: how many people doe usually verify problem statements and solutions? Looks like this was missed even by 'tester'. answered 19 Aug '18, 18:55 1●1 accept rate: 0% @koyaaniqatsi, you are correct. I have already mentioned in the editorial that the test case for small dataset was weak and the english statement was horrible. About 4 different people tested their solution during the preparation, but everyone got the same logic and everything looked ok then. Regarding question being removed or contest being unrated, please contact @admin or @mgch. (19 Aug '18, 19:42) likecs6★
 0 @acmonster you correct that the current statement doesn't have the proof for binary search. Even the method for finding k-compressed array will be wrong. The framing of the english statement based on my idea for the question couldn't be framed correctly by me. As per your comments (which are posted in the editorial), the correct statement should have been: "B should preserve all the relative sizes for each index pair (i, j) such that |i - j| <= K. In other words, if A[i] < A[j], we should have B[i] < B[j]; if A[i] = A[j], we should have B[i] = B[j], etc" If this, the editorial is exactly in lines with what you said. I apologise if someone faced the similar situation during the contest. answered 19 Aug '18, 19:37 6★likecs 3.7k●24●81 accept rate: 9%
 0 Can anyone please help me to debug my code of this problem :( My code- link I am getting WA on the last 3 test cases. answered 19 Aug '18, 23:35 93●3 accept rate: 0%
 0 I implemented a brute force implementation during the contest which was almost same as the basic idea mentioned in the editorial i.e. updating number at given index and finding min in a given range. I was targeting the first two sub-tasks at that moment. However, even after spending quite a bit time on it, it gave me selectively wrong answer. I am posting my last solution here. Any help will be appreciated. Thanks. Here's my submission: https://www.codechef.com/viewsolution/19654286 answered 23 Aug '18, 01:20 3★killerx 1 accept rate: 0%
 0 can anyone explain what's wrong with following codes? I have checked codes many times but I could not figure out what causes my codes to causing SIGSEGV error. link to my solution - https://www.codechef.com/viewsolution/19877450 answered 26 Aug '18, 08:26 1 accept rate: 0%
 0 Can anyone please explain the fault in my code...I have given this question a lot of time still no improvment ; Got AC in 6 subtasks but no complete AC answered 26 Aug '18, 22:12 78●6 accept rate: 7% anyone? :0 (15 Sep '18, 22:37)
 0 link to my submission @likecs could you help?? Can anybody plz tell why I am getting wrong answer in the very last test case ??? I am stuck at it for too long !! Any help will be appreciated .... Thanks in advance ! answered 02 Oct '18, 03:35 1●1 accept rate: 0%
 toggle preview community wiki:
Preview

By Email:

Markdown Basics

• *italic* or _italic_
• **bold** or __bold__
• image?![alt text](/path/img.jpg "title")
• numbered list: 1. Foo 2. Bar
• to add a line break simply add two spaces to where you would like the new line to be.
• basic HTML tags are also supported
• mathemetical formulas in Latex between \$ symbol

Question tags:

×15,875
×2,658
×1,768
×1,056
×593
×196

question asked: 13 Aug '18, 15:01

question was seen: 8,042 times

last updated: 02 Oct '18, 03:37