TLE in FXSEQ

anon8049083 · May 23, 2020, 1:42am

Problem: FXSEQ Problem - CodeChef
Submission : CodeChef: Practical coding for everyone
what could be the possible reason for TLE ?

galencolin · May 23, 2020, 2:55am

Your matrix multiplication is the bottleneck, and you do a lot of modding on that one line (line 164) - you should try replacing it with something like P[i][j] = (P[i][j] + A[i][k] * B[k][j]) % mod;

anon8049083 · May 23, 2020, 3:21am

@galencolin what if A[i][k] * B[k][j] overflow

galencolin · May 23, 2020, 3:21am

It won’t, because you can mod everything in the matrix beforehand. The idea is you don’t want O(T\log{10^{18}} \cdot 2^3 \cdot 6) mod operations, but you can do some extra mods elsewhere without much cost.

anon8049083 · May 23, 2020, 3:23am

thanks a lot that’s an interesting optimization

anon8049083 · May 23, 2020, 3:34am

https://www.codechef.com/viewsolution/33237439
I’m still getting TLE
@galencolin

anon8049083 · May 23, 2020, 3:40am

Is it because of vector?

galencolin · May 23, 2020, 3:42am

Might be, I actually don’t know why it’s so slow (seems to be 2-3 seconds), maybe the complexity is just bad

anon8049083 · May 23, 2020, 3:45am

Using struct with 4 variables gives 0.23 AC
https://www.codechef.com/viewsolution/33160937

anon8049083 · May 23, 2020, 3:49am

Is there any better generic Matrix template that passes these test cases?

galencolin · May 23, 2020, 4:03am

I feel like your manual unrolling of the loop was what made it so fast. My matrix template times out too lol. I’m interested to see the official solution and why the constraints were to tight…

anon8049083 · May 23, 2020, 4:04am

Official Editorial: Editorial: Programming Contest #1 (Chapter 'Ramayan') - Codeforces

galencolin · May 23, 2020, 4:14am

Yeah holy crap. Making it an array, not a vector, speeds my code up by 7x.

anon8049083 · May 23, 2020, 4:16am

How to replace vector with array in my generic template ?

galencolin · May 23, 2020, 4:38am

So I was going to modify my template to use an array and share it as an example of how to do it. But I’m stuck on WA and am not finding a countercase, even when testing against the setter’s solution. So I’ll get back to you.

edit: Figured it out. I was copying your solution wrong
Here’s the template:

Matrix

/* note: additional optimization - first take everything modulo mod^2, using if (x >= mod^2) x -= mod^2, then take everything modulo mod at the end */
template<int n>
struct matrix {
  using TYPE = ll;
  TYPE v[n][n];

  matrix() {
    memset(v, 0, sizeof(v));
  }

  matrix<n> mul(matrix &b) {
    matrix<n> res;
    for (int i = 0; i < n; i++) {
      for (int j = 0; j < n; j++) {
        for (int k = 0; k < n; k++) {
          res.v[i][j] = (res.v[i][j] + v[i][k] * b.v[k][j]) % mod;
        }
      }
    }
    return res;
  }

  matrix<n> pow(matrix<n> &a, long long x) {
    matrix<n> res;
    for (int i = 0; i < n; i++) res.v[i][i] = 1;

    while (x) {
      if (x & 1) {
        res = res.mul(a);
      }
      x /= 2;
      a = a.mul(a);
    }
    return res;
  }  
  #pragma message("be careful with mod in matrix")
};

Accepted submission (0.22s)

anon8049083 · May 23, 2020, 4:47am

Any reason for \mod MOD ^ 2 in the note section

galencolin · May 23, 2020, 4:50am

It’s because at any point, assuming you maintain that the two matrices you multiply have already been modulo’d, any value in the resultant matrix will not change by more than mod^2 each operation. So you can maintain the numbers modulo mod^2 by using this type of modulo operation:

if (x >= mod * mod) {
  x -= mod * mod;
}

and take everything modulo mod at the end, so you’ll end using O(n^2) modulo operations in total instead of O(n^3). I feel like this is dangerous for some reason, so I won’t template it by default, but I have it as a note in case even more speedup is necessary.

anon8049083 · May 23, 2020, 4:53am

But what if we end up with overflow while multiplying two numbers less than MOD^2?

galencolin · May 23, 2020, 4:54am

It should be true that any two values you multiply are less than mod. Notice that in matrix multiplication, the only values you actually use are those from the two matrices you multiply together.

anon8049083 · May 23, 2020, 4:57am

what if summation crosses the LLONG\_MAX just before the MOD^2 check ?