### PROBLEM LINK:

**Setter:** Bogdan Ciobanu

**Tester:** Misha Chorniy

**Editorialist:** Taranpreet Singh

### DIFFICULTY:

Medium

### PREREQUISITES:

Xor operation, Precomputation, Meet in the middle.

### PROBLEM:

Given a sequence A_0 of length N indexed 1-based, we define an operation f on array as A_x[i] = A_{x-1}[i] \oplus A_{x-1}[i+1] \forall i \in [1, N-1] and A_{x}[N] = A_{x-1}[N].

We have to answer queries: Given T and P, answering the value of A_T[p]. Note that T \leq 10^9.

### SUPER QUICK EXPLANATION

- Simple but slow solution will be to build the whole table A_x which results in O(N^2) time.
- We will precompute answer for lower bits for every position, say for 2^B values.
- Then for every query, we will skip directly over blocks of 2^B and use precomputes values to get xor-sum of values to be included from blocks block.

### EXPLANATION

For this problem Iâ€™ll be discussing two solutions, one slow one to illustrate how this process actually works, followed by another slow solution, which we will, at last, optimize by precomputation.

First of all, letâ€™s make an observation, Sequence A_x is same as A_0 if x is a multiple of power of two > N, that is, period of sequence A is M = 2^{\lfloor \log_2 N \rfloor +1}.

**Proof:** Reason is that, after M operations, Each elements effect on elements to their left gets neutralised because of property A \oplus B \oplus B = A. Due to this property, the value A[j] is xored in A_x[i] if, after x operations, A[j] is xored with A[i] odd number of times.

See example array 1 2 4.

0: 1 2 4 8 16

1: 3 6 12 24 16

2: 5 10 20 8 16

3: 15 30 28 24 16

4: 17 2 4 8 16

5: 19 6 12 24 16

6: 21 10 20 8 16

7: 31 30 28 24 16

8: 1 2 4 8 16

9: 3 6 12 24 16

See, last element never changes, last 2 element changes with period 2, last 3 as well as last 4 elements changes with period 4 while if N = 5, sequence repeat after 8 operations which is same as M = 2^{\lfloor \log_2 N \rfloor +1}.

**First slow solution**

We make a table B[x][i] where B[x][i] represent A[x][i]. We can calculate this table for x upto M where M = 2^{\lceil \log_2 N \rceil}. Since after M, the sequnce repeat, we can always take T \bmod M and answer the query. It is easy to calculate the whole matrix in O(N*M) time, which will time out.

**Second slow solution**

Consider a query (T, P).

**Lemma:** Let us consider all submasks of T when written in binary form, say mask. Now, the Final answer will be Xor sum of A[p+mask] for all submasks of T, for all valid positions (p+mask \leq N).

**Proof:** Consider the following table. Here ith column in Tth row being 1 means, that B[T][pos] include xor-sum of A[pos+i]. (Columns are numbered from 0).

0: 1

1: 1 1

2: 1 0 1

3: 1 1 1 1

4: 1 0 0 0 1

5: 1 1 0 0 1 1

6: 1 0 1 0 1 0 1

7: 1 1 1 1 1 1 1 1

8: 1 0 0 0 0 0 0 0 1

9: 1 1 0 0 0 0 0 0 1 1

10:1 0 1 0 0 0 0 0 1 0 1

11:1 1 1 1 0 0 0 0 1 1 1 1

12:1 0 0 0 1 0 0 0 1 0 0 0 1

13:1 1 0 0 1 1 0 0 1 1 0 0 1 1

14:1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

15:1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

The first column has 1 always, Becuase A[X][P] will always include contribution of A[0][P].

After 0 operations, A[0][P] is just as given in input.

After 1 operation, A[1][P] = A[0][P] \oplus A[0][P+1].

After 2 operations, A[2][P] = A[1][P] \oplus A[1][P+1] = A[0][P] \oplus A[0][P+1] \oplus A[0][P+1] \oplus A[0][P+2] = A[0][P] \oplus A[0][P+2]

After 3 operations, A[3][P] = A[2][P] \oplus A[2][P+1] = A[0][P] \oplus A[0][P+2] \oplus A[0][P+1] \oplus A[0][P+3]

If we simulate this process forward, we can see, that in next step, A[4][P] = A[0][P] \oplus A[0][P+4].

**This works the same way as sub-masks of T.**

At T = 1, Submasks of T are 0 and 1, which means, A[1][P] = A[0][P] \oplus A[0][P+1]

At T = 2, Submasks of T are 0 and 2, which means, A[2][P] = A[0][P] \oplus A[0][P+2]

At T = 3, Submasks of T are 0, 1, 2 and 3, which means, A[3][P] = A[0][P] \oplus A[0][P+1] \oplus A[0][P+2] \oplus A[0][P+3]

We can see, how the contribution of A[0][P+1] disappeared, just the same way, 1 is a submask of 1, while 1 is not a sub-masks of 2.

We can use this fact to make another slow solution, where for a given query (T, P) we check for all positions X \in [P, N] and if X-P is a sub-mask of T, we take its xor-sum.

This solution requires no preprocessing, but takes O(N) time per query, making the solution run in O(Q*N) time, once again exceeding the time limit.

**Faster solution**

We will combine both solutions. First of all, Observe the following in above table.

0: 1 0 0 0

1: 1 1 0 0

2: 1 0 1 0

3: 1 1 1 1

We can see, that for every 4 values of T, the positions included in the range [P, P+4) remain the same because these values have a cycle of 4.

Now, **If we precompute the table for the first 4 values, we can skip directly over 4 positions.**

See, Suppose we have T = 5 for some position P. It will imply A[5][P] = A[0][P] \oplus A[0][P+1] \oplus A[0][P+4] \oplus A[0][P+5].

Now, see last two bits of T, when written in binary form. They are 01 which is 1.

The thing to understand here is, that A[1][P] = A[0][P]+A[0][P+1] and in same way, A[1][P+4] = A[0][P+4] \oplus A[0][P+5].

Hence, A[5][P] can also be written as A[1][P] \oplus A[1][P+4].

Hence, A[5][P] = A[1][P] \oplus A[1][P+4]. The reason it works is, that if we have calculated the table till values X, X = 2^B for some B,**We have actually divided the array into blocks of size X**. Lower B bits of T decide, which values to be included in the answer from every block while upper bits decide, which blocks to be included.

Suppose we have removed the lower B bits of T, now, once again, by similar reasoning, we see that values included in xor-sum will xor value from blocks which are sub-masks of T. The lower bits automatically choose the right value from blocks.

See, Let T = 14 = 1110_2. Lower B bits of T is 10 which is 2, and T after removing lower B bits is 11 which is 3. Now, We can rewrite A[14][P] as A[2][P] \oplus A[2][P+1*(2^B)] \oplus A[2][P+2*(2^B)] \oplus A[2][P+3*(2^B)].

The reason for multiplying by 2^B is that we skip directly to the next block which is at 2^B steps from the current block. For every block, A[lb][P] includes xor-sum of all values from the block of size 2^B.

Here too, we iterate over submasks of T after removing lower B bits of T, and for the submask, we include the required value from the block corresponding to that sub-mask.

Hence, we manage to solve the problem in O(N*X + Q*N/X) for X = 2^B. By choosing a suitable value of B, we can solve this problem fast enough.

For quickly iterating over a submask of a value, see the following pseudocode

```
for( i = mask; true; i = (i-1) & mask){
//Do something
if i == 0: break
}
```

Refer more about iterating over submasks here.

### Related problems

A problem on very similar idea can be found here. There was another problem on very same idea in one of the past contests at codechef, i cannot find now. If anyone has link, please post in comments.

Another recent problem, which required iterating over submasks of a number along with dynamic programming, can be found here.

### Time Complexity

Time complexity is O(N*(2^B)) for precomputation and O(Q*N/(2^B)) for answering queries. Choosing Value of B = 7, we can get the solution to run fast enough to fit inside the time limit.

Memory Complexity of this solution is O(N*(2^B)) for storing the precomputed table.

### AUTHORâ€™S AND TESTERâ€™S SOLUTIONS:

Setterâ€™s solution

Testerâ€™s solution

Editorialistâ€™s solution

Feel free to Share your approach, If it differs. Suggestions are always welcomed.