BLKJK - Editorial

PROBLEM LINK:

Practice
Div-2 Contest
Div-1 Contest

Author & Editroialist: Mohamed Anany
Tester: Radoslav Dimitrov

DIFFICULTY:

Medium-Hard

PREREQUISITES:

Knapsack, Bitset

PROBLEM:

There are N cards with numbers written on them arranged in a stack. The cards are winning if there’s any prefix with sum between X and Y. Find the minimum number of swaps you need to make the cards winning.

QUICK EXPLANATION:

Let pre_{i_1,j_1,s_1} denote whether the prefix of length i_1 has a subsequence of length j_1 and sum s_1. Calculate a similar DP for the suffix. Try every position to split the array, and then pick a subsequence from the prefix and another from the suffix with total sum between X and Y, minimizing the number of elements from the suffix. You can do that in cubic time by trying every valid state from the prefix and checking if there’s a match for it in the suffix, and then you can divide the constant by 32 by calculating and merging the DP with bitsets.

EXPLANATION:

Let’s look at the final configuration of the cards. Let’s say the prefix with sum between X and Y has length i. We’ll split the array into 2 parts: the prefix with length i and the suffix with length N-i. That’s a convention I’ll use throughout the editorial. Let’s look at the cards in that prefix. Some of them were already there and didn’t need any swaps, let their count be j_1. The rest were swapped between the suffix and the prefix, let their count be j_2=i-j_1.

Now, I’ll describe a cubic solution. Let pre_{i_1,j_1,s_1} denote whether the prefix of length i_1 has a subsequence of length j_1 and sum s_1. This array is very easy to calculate using knapsack-like DP. Let’s calculate a similar DP suf_{i_2,j_2,s_2} for the suffix. What we’re interested in doing is matching a subsequence of the prefix with one of the suffix such that:

  • j_1+j_2=i.
  • X \le s_1+s_2 \le Y.
  • j_2 is as small as possible.

What that means in words is: we’ll choose j_2 elements with sum s_2 from the suffix, and we’ll swap them with some elements from the prefix (making j_2 swaps.) We want the sum to be between X and Y, and of course we want to minimize the number of swaps, j_2.

This is easy to do. We’ll just fix i_1, j_1, and s_1, and we’ll see if this is possible using the array suf. If it is, then i_2=n-i_1, j_2=i_1-j_1, and s2 is in a certain range. We can check if any such subsequence is possible with a cumulative array on the DP suf. If it is, minimize with j_2.

Now, how to accelerate this? For starters, we can implement the arrays pre and suf using bitsets to divide the constant by 32. However, that’s not enough. We still have another cubic loop to calculate the answer, and we can’t really improve its constant. We need a better idea.

Maybe we should decrease the number of states we have to check in that loop. For every pair (j_1,s_1), instead of trying every possible i_1, how about we only try the smallest possible i_1? You can calculate that, either by binary search on the array pre, or by looking at which states change from 0 to 1 when you’re transitioning from pre_{i_1-1,j_1,s_1} to pre_{i_1,j_1,s_1}. Do the same for the suffix. That changes the number of states we need to check from cubic to quadratic! We however need to adjust our constraints a little. We now need:

  • j_1+j_2 \ge i_1. Notice the difference between this constraint here and the original solution. The = has to change to \ge because we only check the smallest possible i_1. You can rewrite this as: j_2 \ge i_1-j_1.
  • X \le s_1+s_2 \le Y.
  • j_2 is as small as possible.

Notice that i_1=n-i_2. We can maybe iterate over all the possible (i_2,j_2,s_2) such that suf_{i_2,j_2,s_2} is 1, provided that i_2 is as small as possible. Remember, we did that to iterate over quadratically-many states instead of cubically-many. Now, we want to match a subsequence from the prefix with this one, satisfying the constraints above. Namely, i_1=n-i_2, i_1-j_1 \le j_2, and s2 belongs in a certain range. You may think we can build some cumulative array over the sums to check if such match exists in O(1) and we’d be done, but you’d be wrong; there’s a subtle point which is: the cumulative array itself would take cubic time to calculate :face_palm:

However, this problem is not so hard to rectify. The problem now basically reduced to: you have a bitset, and you want to check if there’s a one in some range. You’ll solve this using, low and behold, yet another bitset. For every range, we’ll compute a bitset containing ones in this range and zeros everywhere else. We’ll take the bitwise-and with our bitset and check if it has any ones.

So finally, every cubic part in our solution uses a bitset, so the constant is really small and the code can easily pass in one second.

Setter's Solution
#pragma GCC optimize ("O3")
#pragma GCC optimize ("unroll-loops")

#include "bits/stdc++.h"
using namespace std;

#define pb push_back
#define F first
#define S second
#define f(i,a,b)  for(int i = a; i < b; i++)
#define endl '\n'

using ll = long long;
using db = long double;
using ii = pair<int, int>;

const int N = 1005;

bitset<N> b[N],pre[N][N],br[N][N];
int a[N];
bool check(int i,int j,int l,int r){
    return (pre[i][j]&br[l][r]).count();
}
int main(){
    b[0][0]=1;
    for (int i=0;i<=1000;i++){
        for (int j=i;j<=1000;j++){
            if (i!=j)
                br[i][j]=br[i][j-1];
            br[i][j][j]=1;
        }
    }
    int t;
    cin >> t;
    while (t--){
        int n,A,B;
        cin >> n >> A >> B;
        f(i,1,n+1)  cin >> a[i];
        f(i,1,n+1)  b[i].reset();
        f(i,0,n+1)
            f(j,0,B+1)
                pre[i][j].reset();
        pre[0][0][0]=1;
        for (int i=1;i<=n;i++){
            for (int j=i;j>0;j--){
                auto cur=(b[j-1]<<a[i]);
                cur=(cur^(cur&b[j]));
                b[j]|=cur;
                while (!cur.none()){
                    int tmp=cur._Find_first();
                    cur[tmp]=0;
                    if (tmp<=B)
                        pre[i][i-j][tmp]=1;
                }
            }
        }
        f(i,0,n+1){
            f(j,0,n+1){
                if (i)
                    pre[i][j]|=pre[i-1][j];
                if (j)
                    pre[i][j]|=pre[i][j-1];
            }
        }
        f(i,1,n+1)  b[i].reset();
        int ans=1e9;
        if (check(n,0,A,B))
            ans = 0;
        for (int i=n;i>0;i--){
            for (int j=n-i+1;j>0;j--){
                auto cur=(b[j-1]<<a[i]);
                cur=(cur^(cur&b[j]));
                b[j]|=cur;
                while (!cur.none()){
                    int tmp=cur._Find_first();
                    cur[tmp]=0;
                    if (tmp<=B && check(i-1,j,max(A-tmp,0),B-tmp))
                        ans=min(ans,j);
                }
            }
        }
        if (ans==1e9)
            ans=-1;
        cout << ans << '\n';
    }
}
Tester's Solution

This text will be hidden

VIDEO EDITORIAL:

2 Likes

@triplem5ds https://www.codechef.com/viewsolution/41561012
what was there on the task 7??

Very poor editorial. It required more details. It was good until last two paragraphs where you used plain english and skipped formal details.
It require two to three more paragraphs to explain it clearly.

1 Like

How to solve this problem in pyhton3 ?
Is there any alternate of bitset ?

2 Likes

How to <<,>> bitSet class in java? Best to my knowledge it’s not possible. Question’s optimization is c++ exclusive.?

1 Like

can anyone tell what’s wrong with my approach
https://www.codechef.com/viewsolution/41558443

#include<bits/stdc++.h>
using namespace std;
int black_jack(long * box,long n,long s,long e,int ** dp,vector v,int count=0,int req=0)
{
if(e<0)
{
return INT_MAX;
}
if(s<=0 && e>=0)
{
if(req==v.size())
{
return 0;
}
int count=0;
for(int i=0;i<req && i<v.size();i++)
{
if(v[i])
{
count++;
}
}
int a=req-count;
// if(dp[s1][n]!=-1)
// {
// if(dp[s1][n]>a)
// {
// dp[s1][n]=a;
// return dp[s1][n];
// }
// return dp[s1][n];
// }
return a;
}
if(n==0)
{
return INT_MAX;
}
if(dp[s][n]!=-1)
{
return dp[s][n];
}
v.push_back(true);
int x=black_jack(box+1,n-1,s-box[0],e-box[0],dp,v,count+1,req+1);
v.pop_back();
v.push_back(false);
int y=black_jack(box+1,n-1,s,e,dp,v,count+1,req);
dp[s][n]=min(x,y);
return dp[s][n];
}
int black_jack_help(long * box,long n,long s,long e)
{
int ** dp=new int *[s+1];
for(int i=0;i<=s;i++)
{
dp[i]=new int [n+1];
for(int j=0;j<=n;j++)
{
dp[i][j]=-1;
}
}
vector v;
dp[s][n]=black_jack(box,n,s,e,dp,v,0,0);
return dp[s][n];
}
int main()
{
int t;
cin>>t;
while(t–)
{
long n,s,e;
cin>>n>>s>>e;
long * box=new long[n];
for(int i=0;i<n;i++)
{
cin>>box[i];
}
int ans=black_jack_help(box,n,s,e);
if(ans==INT_MAX)
{
cout<<-1<<endl;
}
else
{
cout<<ans<<endl;

    }
}

}
Out of 13 test cases in black jack problem , i am getting TLE in 2 test cases and 2 wrong answers , so can anybody tell me where i am going wrong

2 Likes

Facing same problem with task 7 and 12.

1 Like

I have another solution. It’s time complexity would be O(N*Y).

In my solution i have used two dimensional Dynamic Programming. DP[i][j] stores the minimum number of holes required to get the sum j in the prefix of length i. Here holes denote the number of elements not chosen in getting the sum j (i.e number of swapped out elements) . Calculating it take N*Y time.

Similarly i calculate DP_rev[i][j] which stores the minimum number of elements required to get the sum j in the suffix of length (N-i+1).

Now the minimum of max(DP[i][j1], DP[i+1][j2]) where X <= j1 + j2 <= Y for each i is the final answer.

11 Likes

Your 4 lines of explanation are better than editorial. It really helped :slight_smile:

1 Like

what is max(A) here means?

Nice solution indeed, Naveen.
Seems It’s based on observation that if we have A holes on the left side and B spots to move on the right side, we always can construct solution with exactly max(A, B) swaps.
Just curious, have you seen it/prove before started implement algorithm or just got lucky? :wink:

Could you explain a bit more about how you use DP_rev?

Sorry it’s Y not max(A)

I initially had n^2 * Y algorithm. I was thinking is there a way to reduce the dimension of DP . Then i thought to store the minimum holes in DP[i][j] to reduce the time complexity. As you said I realized that we can construct a solution with max(A, B) swaps. So i proved it and started implementing it.

Sorry there was a typo.

Last line is
Now the minimum of max(DP[i][j1], DP_rev[i+1][j2]) where X <= j1 + j2 <= Y for each i is the final answer.

I hope my solution is clear now.

Thanks for your solution.
I tried implementing your solution and decided it was enough to look at only those cases where A=B and take the min of those. (it seemed more natural than realizing that a solution can be constructed with max(A,B) swaps, though I understand that it can be, but another case with A=B will cover any optimal solution).
Again, I am yet to comprehend the editorial because it includes new concepts but thanks a lot for your solution

Sorry to say this but I am almost convinced you can’t solve this in Python3. (at least not without some very special optimizations that normal people don’t know)
I spent a lot of time on this problem, around 15h during the contest, and more afterwards.

During the contest itself, I could barely run one O(N*Y) loop to get an AC on 22 points, that also took close to 4s.
The elegant solution posted by Naveen in this thread is O(N*Y) and I was expecting that it would at least work because it’s the best solution I have seen to this problem and Codechef gives extra time unlike Codeforces for Python developers. Anyway, talking of actual results, even running the first loop to solve prefix DP takes 0.01 for the 1st test case in CPP and 5.01 TLE in Python.
You can see results here:
https://www.codechef.com/viewsolution/41676768
https://www.codechef.com/viewsolution/41673176

Shocked? I was also, but there are certain operations where Python fails extremely badly, and just initializing and iterating through a 2d list is one of those (numpy isn’t available here).

So in a way, what websites like Codeforces imply more directly by giving same time limit for Python and CPP can help people get over the inertia to code in Python alone.
That said, at least on Codechef, most problems can be solved in Python. However, there will always be problems like this once a while, more often on the harder problems where your language will let you down. I am yet to see a submission more than 34 points using Python for this problem and will be surprised to see one in the future as well.
If you love Python and are ok with not being able to do some problems, then no need to switch. But if you want to be excellent in the long term, it will definitely be a hindrance. Even the problem setters think in c++, and concepts like bitsets will show up all the time. Not only does python not have bitsets, but without numpy and other libraries, you can’t even do basic operations efficiently

Pass with Python is hard, but possible:
https://www.codechef.com/viewsolution/41695283
Indeed allocating whole 1000*1000 is slow, but we can use classic space reduction trick.

2 Likes

Thanks for implementing and sharing this. Very elegant to read and a very helpful trick!