You are not logged in. Please login at to post your questions!






Setter: Bogdan Ciobanu
Tester: Misha Chorniy
Editorialist: Taranpreet Singh




Xor operation, Precomputation, Meet in the middle.


Given a sequence $A_0$ of length $N$ indexed 1-based, we define an operation $f$ on array as $A_x[i] = A_{x-1}[i] \oplus A_{x-1}[i+1] \forall i \in [1, N-1]$ and $A_{x}[N] = A_{x-1}[N]$.

We have to answer queries: Given T and P, answering the value of $A_T[p]$. Note that $T \leq 10^9$.


  • Simple but slow solution will be to build the whole table $A_x$ which results in $O(N^2)$ time.
  • We will precompute answer for lower bits for every position, say for $2^B$ values.
  • Then for every query, we will skip directly over blocks of $2^B$ and use precomputes values to get xor-sum of values to be included from blocks block.


For this problem I'll be discussing two solutions, one slow one to illustrate how this process actually works, followed by another slow solution, which we will, at last, optimize by precomputation.

First of all, let's make an observation, Sequence $A_x$ is same as $A_0$ if $x$ is a multiple of power of two $ > N$, that is, period of sequence A is $M = 2^{\lfloor \log_2 N \rfloor +1}$.

Proof: Reason is that, after $M$ operations, Each elements effect on elements to their left gets neutralised because of property $A \oplus B \oplus B = A$. Due to this property, the value $A[j]$ is xored in $A_x[i]$ if, after x operations, A[j] is xored with A[i] odd number of times.

See example array 1 2 4.

0: 1 2 4 8 16
1: 3 6 12 24 16
2: 5 10 20 8 16
3: 15 30 28 24 16
4: 17 2 4 8 16
5: 19 6 12 24 16
6: 21 10 20 8 16
7: 31 30 28 24 16
8: 1 2 4 8 16
9: 3 6 12 24 16

See, last element never changes, last 2 element changes with period 2, last 3 as well as last 4 elements changes with period 4 while if $N = 5$, sequence repeat after 8 operations which is same as $M = 2^{\lfloor \log_2 N \rfloor +1}$.

First slow solution

We make a table B[x][i] where B[x][i] represent A[x][i]. We can calculate this table for x upto $M$ where $M = 2^{\lceil \log_2 N \rceil}$. Since after $M$, the sequnce repeat, we can always take $T \bmod M$ and answer the query. It is easy to calculate the whole matrix in $O(N*M)$ time, which will time out.

Second slow solution

Consider a query $(T, P)$.

Lemma: Let us consider all submasks of T when written in binary form, say mask. Now, the Final answer will be Xor sum of A[p+mask] for all submasks of T, for all valid positions ($p+mask \leq N$).

Proof: Consider the following table. Here ith column in Tth row being 1 means, that $B[T][pos]$ include xor-sum of $A[pos+i]$. (Columns are numbered from 0).

0: 1
1: 1 1
2: 1 0 1
3: 1 1 1 1
4: 1 0 0 0 1
5: 1 1 0 0 1 1
6: 1 0 1 0 1 0 1
7: 1 1 1 1 1 1 1 1
8: 1 0 0 0 0 0 0 0 1
9: 1 1 0 0 0 0 0 0 1 1
10:1 0 1 0 0 0 0 0 1 0 1
11:1 1 1 1 0 0 0 0 1 1 1 1
12:1 0 0 0 1 0 0 0 1 0 0 0 1
13:1 1 0 0 1 1 0 0 1 1 0 0 1 1
14:1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
15:1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

The first column has 1 always, Becuase $A[X][P]$ will always include contribution of $A[0][P]$.

After 0 operations, $A[0][P]$ is just as given in input.

After 1 operation, $A[1][P] = A[0][P] \oplus A[0][P+1]$.

After 2 operations, $A[2][P] = A[1][P] \oplus A[1][P+1] = A[0][P] \oplus A[0][P+1] \oplus A[0][P+1] \oplus A[0][P+2] = A[0][P] \oplus A[0][P+2]$

After 3 operations, $A[3][P] = A[2][P] \oplus A[2][P+1] = A[0][P] \oplus A[0][P+2] \oplus A[0][P+1] \oplus A[0][P+3]$

If we simulate this process forward, we can see, that in next step, $A[4][P] = A[0][P] \oplus A[0][P+4]$.

This works the same way as sub-masks of T.

At T = 1, Submasks of T are 0 and 1, which means, $A[1][P] = A[0][P] \oplus A[0][P+1]$

At T = 2, Submasks of T are 0 and 2, which means, $A[2][P] = A[0][P] \oplus A[0][P+2]$

At T = 3, Submasks of T are 0, 1, 2 and 3, which means, $A[3][P] = A[0][P] \oplus A[0][P+1] \oplus A[0][P+2] \oplus A[0][P+3]$

We can see, how the contribution of $A[0][P+1]$ disappeared, just the same way, 1 is a submask of 1, while 1 is not a sub-masks of 2.

We can use this fact to make another slow solution, where for a given query $(T, P)$ we check for all positions $X \in [P, N]$ and if $X-P$ is a sub-mask of $T$, we take its xor-sum.

This solution requires no preprocessing, but takes $O(N)$ time per query, making the solution run in $O(Q*N)$ time, once again exceeding the time limit.

Faster solution

We will combine both solutions. First of all, Observe the following in above table.

0: 1 0 0 0
1: 1 1 0 0
2: 1 0 1 0
3: 1 1 1 1

We can see, that for every 4 values of $T$, the positions included in the range $[P, P+4)$ remain the same because these values have a cycle of 4.

Now, If we precompute the table for the first 4 values, we can skip directly over 4 positions.

See, Suppose we have $T = 5$ for some position $P$. It will imply $A[5][P] = A[0][P] \oplus A[0][P+1] \oplus A[0][P+4] \oplus A[0][P+5]$.

Now, see last two bits of T, when written in binary form. They are 01 which is 1.

The thing to understand here is, that $A[1][P] = A[0][P]+A[0][P+1]$ and in same way, $A[1][P+4] = A[0][P+4] \oplus A[0][P+5]$.

Hence, $A[5][P]$ can also be written as $A[1][P] \oplus A[1][P+4]$.

Hence, $A[5][P] = A[1][P] \oplus A[1][P+4]$. The reason it works is, that if we have calculated the table till values X, $X = 2^B$ for some $B$,We have actually divided the array into blocks of size X. Lower B bits of $T$ decide, which values to be included in the answer from every block while upper bits decide, which blocks to be included.

Suppose we have removed the lower $B$ bits of $T$, now, once again, by similar reasoning, we see that values included in xor-sum will xor value from blocks which are sub-masks of $T$. The lower bits automatically choose the right value from blocks.

See, Let $T = 14 = 1110_2$. Lower $B$ bits of $T$ is 10 which is 2, and $T$ after removing lower $B$ bits is 11 which is 3. Now, We can rewrite $A[14][P]$ as $A[2][P] \oplus A[2][P+1*(2^B)] \oplus A[2][P+2*(2^B)] \oplus A[2][P+3*(2^B)]$.

The reason for multiplying by $2^B$ is that we skip directly to the next block which is at $2^B$ steps from the current block. For every block, $A[lb][P]$ includes xor-sum of all values from the block of size $2^B$.

Here too, we iterate over submasks of $T$ after removing lower $B$ bits of $T$, and for the submask, we include the required value from the block corresponding to that sub-mask.

Hence, we manage to solve the problem in $O(N*X + Q*N/X)$ for $X = 2^B$. By choosing a suitable value of $B$, we can solve this problem fast enough.

For quickly iterating over a submask of a value, see the following pseudocode

for( i = mask; true; i = (i-1) & mask){
    //Do something
    if i == 0: break

Refer more about iterating over submasks here.

Related problems

A problem on very similar idea can be found here. There was another problem on very same idea in one of the past contests at codechef, i cannot find now. If anyone has link, please post in comments.

Another recent problem, which required iterating over submasks of a number along with dynamic programming, can be found here.

Time Complexity

Time complexity is $O(N*(2^B))$ for precomputation and $O(Q*N/(2^B))$ for answering queries. Choosing Value of $B = 7$, we can get the solution to run fast enough to fit inside the time limit.

Memory Complexity of this solution is $O(N*(2^B))$ for storing the precomputed table.


Setter's solution
Tester's solution
Editorialist's solution

Feel free to Share your approach, If it differs. Suggestions are always welcomed. :)

This question is marked "community wiki".

asked 04 Nov, 14:36

taran_1407's gravatar image

accept rate: 23%

edited 05 Nov, 01:46

Well... there seems to be another solution, but I haven't finish the coding yet.

Notice that a_i contributes to query(k,x) if and only if C(k,i-x)%2=1, which means i-x is a submask of k.(By some "parity of binomial coefficients" theorem)

So the queries are equivalent to "find all i that (i-x) is a submask of k and xor all a_i".

When x=0, this can be solved by precalculation. (Case 1)

When k=2^d-1, this can be solved by enumerating "how many prefix (binary) bits are equal in i-x and k", and become Case 1. (Case 2.)

Otherwise, we get all consecutive 1-bits ("blocks") in k. To obtain i we need to add x to i-x. Suppose there are d blocks, and we use 2^d time to enumerating "in the adding process, does each block carry a '1' to the next bit", then we've got several independent Case 2 problems. Simply combine each situation of each subproblem, and we're done.

Let the lengths of blocks be l1,l2,...,lk. Then the time complexity proves to be (l1+1)*(l2+1)...(lk+1). Not so obvious though.

Notice that (l1+1)+(l2+1)+...+(lk+1)<=log_2 n, where n is the array length. So the worst case complexity for each query is roughly O(n^0.5) , however I doubt whether test data contain such extreme cases. I guess it'll work really fast in average.


Seems that the theorem is not included on Wikipedia. However, it appeared on page 19 of this lecture note.


answered 07 Nov, 17:47

ratingoverflow's gravatar image

accept rate: 50%

edited 09 Nov, 08:56

Replaced (x-i) with (i-x) everywhere.

Seems interesting. Can you find that link? because the above solution uses that theorem too.

(07 Nov, 21:07) taran_14076★
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text]( "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported
  • mathemetical formulas in Latex between $ symbol

Question tags:


question asked: 04 Nov, 14:36

question was seen: 360 times

last updated: 09 Nov, 08:56