KCOMPRES - Editorial

likecs · August 13, 2018, 3:01pm

Problem Link

Practice

Contest

Author: Bhuvnesh Jain

Tester: Mark Mikhno

Editorialist: Bhuvnesh Jain

Difficulty

MEDIUM

Prerequisites

Binary Search, Greedy, Segment trees.

Problem

You are given a sequence of integers A_1, A_2, \dots, A_N. For an integer K, let’s define a K-compressed sequence B_1, B_2, \dots, B_N as follows:

for each valid i, B_i is a positive integer.
for each valid i, if there are exactly X numbers smaller than or equal to A_i in the subsequence A_{\mathop{max}(1, i-K)}, \dots, A_{\mathop{min}(N, i+K)}, then there must be exactly X numbers smaller than or equal to B_i in the subsequence B_{\mathop{max}(1, i-K)}, \dots, B_{\mathop{min}(N, i+K)}.
B_1 + B_2 + \dots + B_N is minimum possible.

For a given integer S, find the number of values of K (0 \le K \le N) such that the sum of elements of the K-compressed sequence does not exceed S.

Explanation

Let us first understand how to construct a K-compressed array B for given array A and value K. For this, we will iterate over the numbers in increasing order and try to assign the smallest number not yet assigned to any number smaller than it until now. This will be optimal as we try to assign the smallest possible number to each number and iterating in increasing order ensures that if the number assigned to a given number can’t be decreased further, the sum can’t be minimised further. This proves our greedy strategy. Let me explain it through an example, where A = [4 2 8 1 4 3 8 1] and K = 3.

Let the current array B be [-, -, -, -, -, -, -, -]. We now group the numbers having the same value and iterate in increasing order. The following are the values of the array B after each step:

[-, -, -, 1, -, -, -, 1]. Since both 1 are the first number to be considered, we assign them the smallest possible number in B, i.e. 1.
[-, 2, -, 1, -, -, -, 1]. Since, index 2 has number 1 less than it in range [max(1, 2-3), min(8, 2+3)] = min[1, 5]. We assign it the next biggest number.
[-, 2, -, 1, -, 2, -, 1]. Since, index 6 has number 1 less than it in range [3, 8], we assign it next highest number not present in this range i.e. 2. Note that though A[2] < A[6], we can still have B[2] = B[6] as there ranges do not coincide to give a conflict.
[3, 2, -, 1, 3, 2, -, 1]. For both index 1 and 5, we have it as the third largest number in their respective range. So, we assign them the next biggest number which was not used.
[3, 2, 4, 1, 3, 2, 4, 1]. This is similar to the above process.

Thus, the minimum possible sum is (3+2+4+1+3+2+4+1) = 20. Make sure you are clear with the idea before you proceed further.

Now there are some issues which might occur while implementing the above approach. The first thing is that we should deal with all the numbers having the same value together should be dealt together instead of simply iterating in increasing order. One simple counterexample for this is the array A = [10, 30, 30, 20, 10] and K = 1. The compressed array should be B = [1, 3, 3, 2, 1] itself but if we simply iterate over numbers in increasing order, we might end up getting array B as [1, 2, 3, 2, 1] or [1, 2, 2, 2, 1] which is again highly dependent on your implementation.

The correct logic is to first group the numbers by their values. Find what value you might end up giving them based on their ranges. Then, we need to be sure that the value we might give is correct or now. For this, we again iterate over the numbers and group them if their ranges coincide with each other. We assign it the all the numbers in the group the largest number we thought of assigning to any number in the range. For the example A = [10, 30, 30, 20, 10] and K = 1, the process is as follows:

The initial array is [-, -, -, -, -].
[1, -, -, -, 1]. We group 10 first. The ranges for them are [1, 2] and [4, 5]. Since they do not conflict, we assign them separately the minimum number to them. Both of them end up getting 1.
[1, -, -, 2, 1]. Since there is only one 20, we simply assign it the smallest number in the range [3, 5] which is not assigned yet, i.e. 2.
[1, 3, 3, 2, 1]. The ranges for 30 are [1, 3] and [2, 4]. Since they both coincide, we will assign them value together. The value we might end up giving A[2] is 2 since it is the smallest which is not used till now in the range [1, 3] in B. The value we might end up giving A[3] is 3, which is the smallest not used till now in the range [2, 4] for B. Since the ranges coincide, we should give each of them 3, the maximum of the number we might think of assigning them.

Thus, we can easily build the K-compressed algorithm using the above ideas. But how fast can we do it?

Doing it naively will take O(N^2) as it will require you to find the smallest number not assigned yet in a range, taking O(N) for this step alone. But if you restate this problem, it is similar to the following 2 operations:

Update the number at given index.
Find the largest number in a range.

This is a very familiar problem which can be solved using segment trees in O(\log{N}) for each operation. You can read about it here.

Thus, we can built a K-compressed array for given array A and K in O(N * \log{N}) complexity.

To find what possible values of K will lead to compressed array having sum less than S, we can simply iterate over all possible values of K and update the answer. This approach has a complexity of O(N * N * \log{N}) which is enough to pass the first 2 subtasks.

The next thing to note that we can binary search on the answer. To prove this, we need to prove that the minimum sum for K-compressed array does not decrease with increasing K. This again relies on the way we build our K-compressed array using a greeedy algorithm. Since we make sure that each number is compressed to the smallest possible value and the sum can’t decrease with increasing K as the smallest number which might get assigned to a number can only increase if it’s range (or window) increases.

Thus, the overall complexity of the approach reduces to O(N * {\log}^2{N}). This is enough to pass all the subtasks as well.

Once, you are clear with the above idea, you can see the author implementation below for help.

Feel free to share your approach as well, if it was somewhat different.

Note from Author:

The test case data was bit weak for the small subtasks where some wrong greedy approaches passed as well. Though the large test case ensured, wrong solutions could not pass for the full solution, but still, I failed in generating stronger test data for smaller ones. This is not completely our fault as there are many ways the greedy solution for this compression could be written where even index errors could happen which implementing. I would like to thank all the people who testing their solutions as well and helped me strengthen the test as well (Special thanks to Stepan). But still, we alone cannot come up with all possible greedy solution which one might write, so stronger test could not be prepared by me. By the time, I came to know about this it was already to late for a rejudge, but I will take care of it in the future problem.

Also, the problem statement seemed quite tough to comprehend for most of the contestants as well. User acmonster even pointed out a flaw in the English statement as well. Below were his comments:

“I guess that the problem (and test data) actually requires that sequence B should preserve all the relative sizes for each index pair (i, j) such that |i - j| <= K. In other words, if A[i] < A[j], we should have B[i] < B[j]; if A[i] = A[j], we should have B[i] = B[j], etc.
If this is correct, the fix does not seem to be sufficient. Consider A = (1, 1, 1, 2, 3), B = (2, 1, 1, 1, 2), and K = 2: both sequences has a “signature” of (2, 2, 2, 2, 2), but, again, the optimal sequence B does not preserve A[1] = A[2] = A[3] < A[4] < A[5].”

Though most of the contestants got what the problem statement meant, I will make sure to make better statements in future as well.

Time Complexity

O(N * {\log}^2{N})

Space Complexity

O(N)

SOLUTIONS:

Author’s solution can be found here.

Tester’s solution can be found here.

Editorialist’s solution can be found here.

dhruvgupta167 · August 13, 2018, 4:44pm

https://www.codechef.com/viewsolution/19654812
Can anyone please help me.My solution passed all the solution except the last one which is giving WA.
@include_sajal my solution was also giving TLE and WA first.I got rid of the TLE but there is still one WA.Could please tell what approach of handling those testcase.

hrishabh0901 · August 14, 2018, 10:15am

Can anyone please explain how O(N∗N∗logN) transformed into O(N∗log2N) using binary search??

namanjain007 · August 14, 2018, 11:24am

[hrishabh0901][1]
instead of iterating over increasing values of K,you can do bsearch as with increasing k,your sum would also be non decreasing…
[1]: Profile - hrishabh0901 - CodeChef Discuss

kshitij_07 · August 14, 2018, 4:39pm

Can anyone please explain the fault in my code. I coded as per the editorial, using only the greedy approach (No Segment Tree) and getting AC on 6 subtasks out of 10 but no complete AC in any task.
@likecs it would be great if you could kindly help out.

My Code

decentgeek · August 14, 2018, 5:37pm

Please help me to find counter test case of my code of the problem KCOMPRES.
Only two task remaining.
My solution included segment tree approach.
My Solution

rohitp12 · August 14, 2018, 10:52pm

can someone explain the k-compress part of the question…thanks in advance
“for each valid i, if there are exactly X numbers smaller than or equal to Ai in the subsequence Amax(1,i−K),…,Amin(N,i+K), then there must be exactly X numbers smaller than or equal to Bi in the subsequence Bmax(1,i−K),…,Bmin(N,i+K).”

faizz7 · August 15, 2018, 12:18am

Managed to optimize my brute-force approach and solve it without segment tree.

Time and space complexity remains the same as that of this editorial.

Binary search was the key in solving this problem.

My solution: https://www.codechef.com/viewsolution/19673140

jlewis200 · August 15, 2018, 8:21am

I was able to solve all test cases during the competition with a completely different approach. The problem reminded me of software dependencies and I started looking at directed acyclic graphs and found that they fit this application fairly well. I sorted the list elements into a topological order where repeated elements were all given the same value. This is definitely similar to your grouping step in the editorial solution. For k = 0, each compressed element is given a value of 1. I then iterated through the values of k from 1 to N. For each incremental value of k, and each list element, there are at most 2 new “dependencies.” For example, at k = 0, each element must be greater than or equal to at most 0 elements. At k = 1, each element must be greater than or equal to at most 2 elements. At k = 2, each element must be greater than or equal to at most 4 elements, etc. … After each iteration of k, I found the max of the dependencies and incremented or equivalated as necessary. I did have an issue with solving the test cases for the largest values of N within the time constraints, but this was solved by recognizing that if S > ((N * (N + 1) / 2 ), then the result should be N + 1. That was my basic approach, but there was some optimization code to remove unnecessary dependencies, ones which would be resolved by other dependencies with a higher topological order. This was definitely a tough one, the comments section was full of frustrated and angry people, I was among them for a long time.

acmonster · August 16, 2018, 1:44pm

I think that the solution is incorrect, since the cost of the K-compressed sequence is not monotonic in K. Here is a simple counter-example.

Consider sequence A = (2, 1, 2, 3, 4). One can verify that one of its 1-compressed sequences is (2, 1, 2, 3, 4) itself, which has a cost of 12. However, one of its 2-compressed sequence is (3, 1, 2, 2, 3), which, in fact, has a lower cost of 11.

byomkeshbakshy · August 16, 2018, 8:18pm

@acmonster I didn’t get what you are trying to say…we want to find the k compressed sequence with lowest sum so 1-compressed sequence with minimum sum for {2 1 2 3 4} is {1 1 1 1 1} with minimum sum of 5 while 2 compressed sequence with minimum sum is {3 1 2 2 3} with sum 11. Why are u comparing any random k compressed sequence?

acmonster · August 17, 2018, 6:28am

Hi @byomkeshbakshy, you might want to check that the 1-compressed sequence with minimum sum for (2, 1, 2, 3, 4) is (2, 1, 2, 3, 4) itself, not (1, 1, 1, 1, 1). This shows that the minimum sum of K-compressed sequence is not necessarily non-decreasing in K.

ankush_953 · August 17, 2018, 11:27am

@likecs Please help me with this solution… What can be test cases It may be failing upon?
https://www.codechef.com/viewsolution/19759987

buda · August 17, 2018, 10:21pm

FWIW, I also thought binary search would be useful here upon seeing the problem, but then wrote a brute-force solver and found counterexamples. One’s intuition can be misleading.

roll_no_1 · August 18, 2018, 12:09pm

In the editorial, it is written to check whether the ranges of the identical elements coincide with each other, but it does not give the criteria for coinciding ranges. Example: For a = [2, 3, 4, 3] and k = 1, the range of the 3 at index 2(assuming 1 based indexing) is [1, 3] and that for the 3 at index 4 is [3, 4]. But both of these can be mapped to different values. So, to check whether the ranges for 2 indices i and j (i < j) for a given value of k overlap or not, we need to check j - i <= k, and not i + k <= j - k.
In the example above, the optimal compression for k = 2 is [1, 2, 3, 1], while using the wrong condition it comes out to be [1, 2, 3, 2]. My solution with the wrong condition passed all but the last test file.

buda · August 18, 2018, 1:43pm

I doesn’t look to me like the described greedy solution works. I also had the intuition that something like that would work (e.g. “all equal values in A that are within each other’s range will be the same in B”) but I managed to disprove all of these intuitions that I had.

For example, for A = [1, 1, 2, 2] and K = 1, the setter’s solution computes B = [1, 1, 2, 2], but in fact, B = [2, 1, 1, 1] with a smaller sum of 5.

So I wonder if someone can explain a correct algorithm to compute the K-compressed sequence in O(nlgn) time.

koyaaniqatsi · August 19, 2018, 6:55pm

@admin, @likecs, is anything going to happen here?

I do not see how this:

I guess that the problem (and test
data) actually requires that sequence
B should preserve all the relative
sizes for each index pair

can be derived from problem statement… Moreover, I bet, during AUG18 most people who submitted greedy algorithm and got AC - they didn’t question their own solution(they got AC, you know) and just moved on to next problems. It can’t be justification of incomplete problem statement.

I just picked and tested some of AC solutions from top 30 aug18 finishers, all they look wrong and fail on these ‘counterexamples’ - you can easily check it yourselves using following input:

2
4 6
3 2 1 1
6 12
4 3 1 2 1 2

and if you get following response from any solution:

1
2

it is wrong answer. I don’t think how this whole situation is ok.

Besides to @acmonster and @buda have already pointed to these very flaws in editorial solution. I as well developed greedy algorithm at first during the contest but on first WA(bug in implementation), after more thorough thinking, I discovered similar ‘counterexamples’ myself. So that essentially made problem harder obviously and I ‘postponed’ it(and it turned out I never returned back to it during the rest of contest).

IMHO, either this problem should be removed from every contestant’s score or entire aug18-long should be unrated. Because It feels like bad precedent.

Am I missing any specific rules for such cases?

PS: how many people doe usually verify problem statements and solutions? Looks like this was missed even by ‘tester’.

likecs · August 19, 2018, 7:37pm

@acmonster you correct that the current statement doesn’t have the proof for binary search. Even the method for finding k-compressed array will be wrong. The framing of the english statement based on my idea for the question couldn’t be framed correctly by me. As per your comments (which are posted in the editorial), the correct statement should have been:

“B should preserve all the relative sizes for each index pair (i, j) such that |i - j| <= K. In other words, if A[i] < A[j], we should have B[i] < B[j]; if A[i] = A[j], we should have B[i] = B[j], etc”

If this, the editorial is exactly in lines with what you said. I apologise if someone faced the similar situation during the contest.

deepak_097 · August 19, 2018, 11:35pm

Can anyone please help me to debug my code of this problem

My code- link

I am getting WA on the last 3 test cases.

killerx · August 23, 2018, 1:20am

I implemented a brute force implementation during the contest which was almost same as the basic idea mentioned in the editorial i.e. updating number at given index and finding min in a given range.

I was targeting the first two sub-tasks at that moment.

However, even after spending quite a bit time on it, it gave me selectively wrong answer. I am posting my last solution here. Any help will be appreciated. Thanks.

Here’s my submission: CodeChef: Practical coding for everyone