ORAND - Editorial

ORAND - Editorial


Div-2 Contest
Div-1 Contest

Author & Editorialist: Mohamed Anany
Tester: Radoslav Dimitrov




Bitwise Convolutions


Given 2 arrays A and B, you start with a number V=0, and in each turn you replace V with its bitwise-or with some number in A or its bitwise-and with some number in B. You can stop any time. How many distinct results can you get?


The key observation is that you need at most log(MX) turns to get any number you can get. With that in mind, let’s say you have a vector v carrying all the results you can reach. If you want to see all the results you can get with one more bitwise-or, you should just convolve the array v with the array A using the bitwise-or convolution. Similarly for bitwise-and. If you alternate between bitwise-or and bitwise-and 20 times, you’ll get all the possible results.


Let’s call the number of bits in the largest number in input K. Say we look at a certain sequence of operations, and it gets us some result. Let’s look at the final operation. Assume it’s OR, and a similar argument will apply for AND.

If you’re OR-ing with 0, that doesn’t make any sense and you should throw away this operation. Otherwise, let’s look at the ones in the number we’re ORing with. These bits are going to be one in the result no matter what the previous operations were! It’s like everything we did in the previous operation just gets overwritten with ones and flushed down the toilet. So the previous operations only play with the rest of the bits. That means they have at most K-1 bits to play with. Remove that last operation and repeat this argument. Low and behold, that means the sequence’s length is at most K!

So any number that can be achieved, can be achieved in 20 turns. So that gives us an improvement in the naive solution:

  • Carry an array v giving you all the possible results you can get so far. Initially, it’s {0}.
  • We want to see which numbers we can get using one more bitwise-or operation. That is, we want all the possible results of (x \vee y) where x belongs to v and y belongs to A. These results are going to be added to v. The obvious way to do this is by iterating over all the pairs of elements of v and A.
  • Repeat step 2 but with bitwise-and.
  • Keep doing this until all the results are reached.

Our cute observation tells us we only need 20 steps to reach all the results.

The last piece of the puzzle is quite technical: look at step 2 (and 3) of the algorithm. We get all the possible results by a nested loop in O(MX^2). However, this could be directly accelerated to O(MXlog(MX)) using the bitwise convolutions. You can read more about this here.


Setter's Solution
#pragma GCC optimize ("O3")
#pragma GCC optimize ("unroll-loops")

#include <bits/stdc++.h>
using namespace std;
const int N = 1e5 + 5, MOD = 1e9 + 7;
#define f(i,a,b)    for(int i = a; i < b; i++)
int n, m;
int in1[1<<22],in2[1<<22],ans[1<<22];
template <typename T>
struct FWT {
    void fwt(T io[], int n,bool f) {
        for (int d = 1; d < n; d <<= 1) {
            for (int i = 0, m = d<<1; i < n; i += m) {
                for (int j = 0; j < d; j++) { /// Don't forget modulo if required
                    T x = io[i+j], y = io[i+j+d];
                // 	io[i+j] = (x+y), io[i+j+d] = (x-y);	// xor
                        io[i+j] = x+y; // and
                        if(io[i+j] >= MOD)io[i+j] -= MOD;
                    else {
                        io[i+j+d] = x+y; // or
                        if(io[i+j+d] >= MOD)io[i+j+d] -= MOD;
    void ufwt(T io[], int n, bool f) {
        for (int d = 1; d < n; d <<= 1) {
            for (int i = 0, m = d<<1; i < n; i += m) {
                for (int j = 0; j < d; j++) { /// Don't forget modulo if required
                    T x = io[i+j], y = io[i+j+d];
                     /// Modular inverse if required here
                // 	io[i+j] = (x+y)>>1, io[i+j+d] = (x-y)>>1; // xor
                        io[i+j] = x-y; // and
                        if(io[i+j] < 0)io[i+j] += MOD;
                        io[i+j+d] = y-x; // or
                        if(io[i+j+d] < 0)io[i+j+d] += MOD;
    // a, b are two polynomials and n is size which is power of two
    void convolution(T a[], T b[], int n,bool f) {
        fwt(a, n, f);
        for (int i = 0; i < n; i++){
            a[i] = 1ll * a[i]*b[i] %MOD;
        ufwt(a, n, f);
        for (int i = 0; i < n; i++){
            if(a[i] > 1)a[i] = 1;
    // for a*a
    void self_convolution(T a[], int n) {
        for (int i = 0; i < n; i++)
            a[i] = 1ll * a[i]*a[i] %MOD;
        for (int i = 0; i < n; i++){
            if(a[i] > 1)a[i] = 1;
FWT<int> fwt;
int solve(vector<int> a, vector<int> b){ ///O(K^2 * 2 ^ K)
    memset(in1,0,sizeof in1);
    memset(in2,0,sizeof in2);
    memset(ans, 0, sizeof ans);
    ans[0] = 1;
    in1[0] = 1;
    int mx = max(*max_element(a.begin(),a.end()), *max_element(b.begin(),b.end()));
    int K = 0;
    while((1 << K) <= mx)K++;
    in2[(1<<K)-1] = 1;
        in1[a[i]] = 1;
        in2[b[i]] = 1;
    for(int k = 0; k < K+K; k++){
    return accumulate(ans,ans+(1<<K),0);
int main(){
    int t;
    cin >> t;
        cin >> n >> m;
        vector<int> a(n), b(m);
        f(i,0,n)    cin >> a[i];
        f(i,0,m)    cin >> b[i];
        cout << solve(a,b) << endl;
Tester's Solution

indent whole code by 4 spaces


As always, different approach is welcome in the comments

1 Like

any resource for learning BITWISE Convolutions?


What about test with T = 1e6 and n = 1, m = 1 A[1] = 1 << 19


Exactly man, no constraint on T confused the hell out of me. I thought there was no way we could maintain all the values for every test case within TL, so there must be some way of counting them without actually storing them.

1 Like

I had the same doubt. After using some assertions I found that none of the test files had 80 or more test cases. The main task probably has even less.

1 Like

garbage problem


Why garbage? I’d like to know your opinion.

1 Like

It is clear that there should have been a constraint on T. Consider a test case with all powers of 2 upto 2^{20} in A. With just 20 numbers, all 2^{20} values are possible as V. Now repeat this test case 2^{20}/20 times as sum of N <= 2^{20}. Any solution that stores all the possible values(including this one), will accumulate all numbers for each test case and thus pass TL, hence would be incorrect. Am I wrong here?

1 Like

What is X in the above explanation? :thinking:

Actually I have different solution. But despite it passed all testcases I still think that it is hackable.
Here is my idea. First of all, let’s mention that there can be ‘useless’ elements in initial arrays A and B. For example, if A=[1, 2, 3]. Then 3 is ‘useless’, because 3 = 1| 2
Then I just delete all ‘useless’ values from A.
Almost the same we can do with array B. May be this part can be called as finding or(and)-basis.
After that I run dfs from vertex 0 and just check whether I can go to any unvisited vertex using new(cleared) arrays A and B.
Code: bkZLow - Online C++0x Compiler & Debugging Tool - Ideone.com

I run this algorithm on max(numbers with 20 bits) random cases and it work quite fast.
Still cannot understand if this solution should pass or I just got lucky :slight_smile:


I had a similar approach. But I decided to give up (╯°□°)╯︵ ┻━┻