I submitted the problem using segment trees but got 70 points only…then i saw the inherent pattern in the queries…that were being generated by the formulae…by the corresponding N values…

and came with a faster dp approach but still got 70 points…

finally i wrote a sparse table…for quering in O(1) …but still got 70 points…

people got AC using sparse table…where did i go wrong?

here is my solution CodeChef: Practical coding for everyone

For performance improvements, you need to use int M[17][maxi]. This makes your code much faster.

First of all, you have stuff like “(b+11)%(t);” there. % operation is very slow; try to rewrite it using subtraction instead. Also you may change M[maxi][17] to M[17][maxi] to avoid large jumps over memory.

Did anyone actually used any maths to solve this problem ?

@karanaggarwal : yes i tried solving it using maths…first i found the pattern in queries…

no matter what the initial x and y were the query pattern repeated after N-1 for x and after N for y

when x=(x+7)%(t-1)

and y=(y+11)%t;

that means the queries will repeat exactly after N*(N-1) queries…so if M was greater than N*(N-1)…

q=N*(N-1)

i calculated only the sum of N*(N-1) queries and multipled it M/q times…and processed the remaining M%q queries…

also queries for x depended on N in such way that if(N%7==1)…then x repeated after N/7 times…

and queries for y depended on N in such way that if(N%11==0)…then y repeated after N/11 times…

so the q term would be (number of time x repeated) * (no. of times y repeated)…

and then using above dp approach…but it still fell apart when the N was neither N%7==1 or N%11==0

thats why i used sparse tables

i can’t understand the purpose behind asking this question… ? First of all giving 70 points just for writing a classic segment tree solution… and even worse…100 points for just optimising the sparse table solution… !

I used the observation that for a particular X suppose the initial Y mapping was Y0 , Then after each subsequent cycles X would be mapped to Y0 - 11 , Y0 - 2*11 and so on . So I could skip all the queries that mapped X to Y later than the leftmost position of maxima between X and Y0 using a simple formula .

What would be the complexity of your solution ?

I used this same observation, for each x find sum of [x,y] [x,y-11] and so on . The queries would be uniformly distributed among all x excluding those which are never visited. So, we can find no. of Y’s for each x and calculate individual sum for all x’s.

However complexity-wise this would still be O(M) assuming we use sparse tables.But apparently, this is a better optimized version than going for all the M queries normally because for a chunk of queries, we are keeping x same,only changing Y. This makes significant difference in time of execution.

Plz post the editorial …of this problem …

Can you explain why?

Simple answer: slow RAM, fast CPU and CPU caches. The problem is, that current CPUs are much faster than RAM - see this article for more information about the CPU-RAM gap. As a result, a CPU access to RAM costs a lot of CPU cycles. To reduce this problem, current CPUs have several levels of caches, which usually work best, when accessing memory addresses, that are close to each other.

You can try a simple example. Try running this program and measure its execution time:

```
#define N 10000
int a[N][N];
int main(void) {
for (int i=0;i<N;i++) {
for (int j=0;j<N;j++) {
a[i][j] = 0;
}
}
}
```

Then switch indices used to access the array - change `a[i][j] = 0`

to `a[j][i] = 0`

- and measure the time again. Depending on your machine and compiler optimizations, the second version will be slower (on my machine it is about 5 times slower). You can try the same with different values of `N`

.

In the first version, you access the array (or matrix) row-by-row. As each row is stored as a continuous memory block, you start with the lowest memory address and continue up to the highest memory address assigned to the array. This type of access is cache-friendly. On the other hand, in the second version you access the array column by column, which means skipping to a distant memory address in each iteration.

As for the original question, one of the problems could also be the `log2`

function, which seems to be quite slow. Try to replace it with a lookup table. See my solution for more details.