Problem: https://www.codechef.com/problems/FXSEQ

Submission : https://www.codechef.com/viewsolution/33236792

what could be the possible reason for TLE ?

Your matrix multiplication is the bottleneck, and you do a lot of modding on that one line (line 164) - you should try replacing it with something like `P[i][j] = (P[i][j] + A[i][k] * B[k][j]) % mod;`

It won’t, because you can mod everything in the matrix beforehand. The idea is you don’t want O(T\log{10^{18}} \cdot 2^3 \cdot 6) mod operations, but you can do some extra mods elsewhere without much cost.

thanks a lot that’s an interesting optimization

Is it because of `vector`

?

Might be, I actually don’t know why it’s so slow (seems to be 2-3 seconds), maybe the complexity is just bad

Is there any better generic Matrix template that passes these test cases?

I feel like your manual unrolling of the loop was what made it so fast. My matrix template times out too lol. I’m interested to see the official solution and why the constraints were to tight…

Yeah holy crap. Making it an array, not a `vector`

, speeds my code up by 7x.

How to replace vector with array in my generic template ?

So I was going to modify my template to use an array and share it as an example of how to do it. But I’m stuck on WA and am not finding a countercase, even when testing against the setter’s solution. So I’ll get back to you.

edit: Figured it out. I was copying your solution wrong

Here’s the template:

## Matrix

```
/* note: additional optimization - first take everything modulo mod^2, using if (x >= mod^2) x -= mod^2, then take everything modulo mod at the end */
template<int n>
struct matrix {
using TYPE = ll;
TYPE v[n][n];
matrix() {
memset(v, 0, sizeof(v));
}
matrix<n> mul(matrix &b) {
matrix<n> res;
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
for (int k = 0; k < n; k++) {
res.v[i][j] = (res.v[i][j] + v[i][k] * b.v[k][j]) % mod;
}
}
}
return res;
}
matrix<n> pow(matrix<n> &a, long long x) {
matrix<n> res;
for (int i = 0; i < n; i++) res.v[i][i] = 1;
while (x) {
if (x & 1) {
res = res.mul(a);
}
x /= 2;
a = a.mul(a);
}
return res;
}
#pragma message("be careful with mod in matrix")
};
```

Accepted submission (0.22s)

Any reason for \mod MOD ^ 2 in the note section

It’s because at any point, assuming you maintain that the two matrices you multiply have already been modulo’d, any value in the resultant matrix will not change by more than mod^2 each operation. So you can maintain the numbers modulo mod^2 by using this type of modulo operation:

```
if (x >= mod * mod) {
x -= mod * mod;
}
```

and take everything modulo mod at the end, so you’ll end using O(n^2) modulo operations in total instead of O(n^3). I feel like this is dangerous for some reason, so I won’t template it by default, but I have it as a note in case even more speedup is necessary.

But what if we end up with overflow while multiplying two numbers less than MOD^2?

It should be true that any two values you multiply are less than mod. Notice that in matrix multiplication, the *only* values you actually use are those from the two matrices you multiply together.

what if summation crosses the LLONG\_MAX just before the MOD^2 check ?