PANIC - Editorial

PROBLEM LINK:

Practice
Contest

Setter: Shahjalal Shohag
Tester: Istvan Nagy
Editorialist: Shahjalal Shohag

DIFFICULTY:

Hard

PREREQUISITES:

Berlekamp-Massey, Cayley–Hamilton theorem, Matrix Exponentiation

PROBLEM:

You are given a K \times K integer matrix M and three integers N, a, d. Find
S(N) = \sum_{i = 0}^{N}{F_{a + i \cdot d} \cdot M^i}. Here, F_i is the i^{th} Fibonacci number.

QUICK EXPLANATION:

S(0), S(1), S(2), \ldots satisfies a linear recurrence of order 2 \cdot K + 1. Find the recurrence using Berlekamp-Massey after compressing the matrices into integers, and then, compute S(N) with the help of Cayley-Hamilton Theorem.

EXPLANATION:

Let (A_n) denote the sequence A_0, A_1, A_2, \dots

  • Lemma 1: If (A_n) satisfies a linear recurrence of order K, then the sequence (A_{m\cdot n + r})n \in\mathbb{N}, for fixed m \in\mathbb{N} and r \in\mathbb{N}_0 = \mathbb{N} ∪ \{0\}, also satisfies a linear recurrence of order K.
    Proof

    smash me

As the Fibonacci sequence (F_n) is of order 2, from lemma 1, (F_{m \cdot n + r}) also satisfies a linear recurrence of order 2.

SUBTASK 1:

As (F_{m \cdot n + r}) satisfies a linear recurrence of order 2, we can find the recurrence [X, Y] assuming F_{a + i \cdot d} = X \cdot F_{a + (i - 1) \cdot d} + Y \cdot F_{a + (i - 2) \cdot d} by solving the following two equations:

  • F_{a + 2 \cdot d} = X \cdot F_{a +d} + Y \cdot F_{a}
  • F_{a +3 \cdot d} = X \cdot F_{a + 2 \cdot d} + Y \cdot F_{a + d}

You can also use Berlekamp-Massey to find the recurrence.

As K = 1, we can consider the matrix M as an integer. Let P = M_{1, 1}. Then,

\begin{bmatrix} X \cdot P & Y \cdot P^2 & 0 \\ 1 & 0 & 0\\ 1 & 0& 1 \\ \end{bmatrix} \cdot \begin{bmatrix} F_{a + (i - 1) \cdot d} \cdot P^{i - 1}\\ F_{a + (i - 2) \cdot d} \cdot P^{i - 2}\\ S(i - 2)\\ \end{bmatrix} = \begin{bmatrix} F_{a + i \cdot d} \cdot P^{i}\\ F_{a + (i - 1) \cdot d} \cdot P^{i - 1}\\ S(i - 1)\\ \end{bmatrix}

So,

\begin{bmatrix} X \cdot P & Y \cdot P^2 & 0 \\ 1 & 0 & 0\\ 1 & 0& 1 \\ \end{bmatrix} ^ N \cdot \begin{bmatrix} F_{a + d} \cdot P\\ F_a\\ S(0)\\ \end{bmatrix} = \begin{bmatrix} F_{a + (N + 1) \cdot d} \cdot P^{N + 1}\\ F_{a + N \cdot d} \cdot P^{N}\\ S(N)\\ \end{bmatrix}

So we can use matrix exponentiation to solve the problem! Refer to my solution for more clarity.

Time complexity: \mathcal{O}(27 \cdot log(N))

SUBTASK 2:

The problem here is that we have to deal with matrices now. But the cool part is that we can solve this subtask using the exact same idea of subtask 1 :astonished:. Let Z be the zero matrix of size K and I be the identity matrix of size K,

I = \begin{bmatrix} 1 & 0 & 0 & \dots & 0 \\ 0 & 1 & 0 & \dots & 0\\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \dots &1\\ \end{bmatrix}

We can use the exact same formula of the previous subtask but with a twist:

\begin{bmatrix} X \cdot M & Y \cdot M^2 & Z \\ I & Z & Z\\ I & Z& I \\ \end{bmatrix} \cdot \begin{bmatrix} F_{a + (i - 1) \cdot d} \cdot M^{i - 1}\\ F_{a + (i - 2) \cdot d} \cdot M^{i - 2}\\ S(i - 2)\\ \end{bmatrix} = \begin{bmatrix} F_{a + i \cdot d} \cdot M^{i}\\ F_{a + (i - 1) \cdot d} \cdot M^{i - 1}\\ S(i - 1)\\ \end{bmatrix}

So,

\begin{bmatrix} X \cdot M & Y \cdot M^2 & Z \\ I & Z & Z\\ I & Z& I \\ \end{bmatrix} ^ N \cdot \begin{bmatrix} F_{a + d} \cdot M\\ F_a \cdot I\\ S(0)\\ \end{bmatrix} = \begin{bmatrix} F_{a + (N + 1) \cdot d} \cdot M^{N + 1}\\ F_{a + (N) \cdot d} \cdot M^{N}\\ S(N)\\ \end{bmatrix}

It is easy to see why it works. So we can use matrix exponentiation on this 3\cdot K \times 3 \cdot K matrix and find S(N). Refer to my solution for more clarity.

Time Complexity: \mathcal{O}(27 \cdot K^3 \cdot log(N))

SUBTASK 3:

  • Lemma 2: If (A_n) satisfies a linear recurrence of order K, then the prefix sum of A (\sum_{i = 0}^{n}A_i) satisfies a linear recurrence of order K + 1. The proof is left an exercise to the reader!

  • Lemma 3: If (A_n) and (B_n) satisfy linear recurrences of order K_1 and K_2 respectively, then (A_n\cdot B_n) satisfies a linear recurrence of order K_1 \cdot K_2. You can look here to enlighten yourself.

  • Lemma 4: Let M be a K\times K square matrix. Then (M^n) satisfies a linear recurrence of order K. This is a direct consequence of the Cayley–Hamilton theorem.

From lemma 1, (F_{m \cdot n + r}) satisfies a linear recurrence of order 2. From lemma 4, (M^n) satisfies a linear recurrence of order K. Then, from lemma 3, (F_{m \cdot n + r} \cdot M^n) satisfies a linear recurrence of order 2 \cdot K. Finally, from lemma 2, (\sum_{i=0}^{n}F_{m \cdot i + r} \cdot M^i) satisfies a linear recurrence of order 2 \cdot K + 1. That means S(0), S(1), S(2), \dots satisfies a linear recurrence of order 2 \cdot K + 1.

  • Lemma 5: If (A_n) satisfies a linear recurrence of order K, then it is enough to apply Berlekamp-Massey on the first 2 \cdot K elements to extract the correct linear recurrence.

So we can compute the first 4 \cdot K + 2 matrices S(0), S(1), S(2), \dots, S(4\cdot K + 2) and extract the linear recurrence using Berlekamp-Massey. You can find a good tutorial on Berlekamp-Massey here.

But the hassle is that now we have to work with matrices. We have to find the correct recurrence of S(0), S(1), S(2), \ldots but each of the elements is a matrix. What can we do? Notice that if we multiply the elements by a constant integer then it won’t make any problem. But what if we multiply each of them by a constant matrix :astonished:? We will do it in a clever way to benefit ourselves:

  • Let R be an 1 \times K matrix and C be a K \times 1 matrix and each of the elements in these matrices be a random integer. Then change the sequence in the following way:
    R \cdot (S(0) \cdot C),R \cdot (S(1) \cdot C), R \cdot (S(2) \cdot C), \ldots

What good will this bring to humanity? Let’s look at the final size of each matrix: [1 \times K] \cdot ([K \times K] \cdot [K \times 1]) \longrightarrow [1 \times K] \cdot [K \times 1] \longrightarrow [1 \times 1]. Voila! We have compressed the matrices into integers.

As we have integers, we can now use Berlekamp-Massey to find the recurrence. We will get the correct recurrence with high probability. Note that we can verify the recurrence by just checking the first few matrices. In practice, it doesn’t require more than one iteration to get the correct recurrence :astonished:!

Now, as we have the correct recurrence, we can find S(N) in \mathcal{O}(K^2log( N) + K^3) . You can look here to know how to do that. In short, S(N) can be represented as a linear combination of S(0), S(1), \ldots, S(2 \cdot K + 1). Find the combination using binary exponention in \mathcal{O}(K^2 log(N)), and then, extract the matrix S(N) in \mathcal{O}(K^3).

Time Complexity Analysis:

  • Precomputing S(i) for each 0 \le i \le 4K + 2 can be done in \mathcal{O}(K^4)
  • The P^{th} Fibonacci number F_P can be computed in \mathcal{O}(log (P))
  • Berlekamp-Massey is \mathcal{O}(K^2)
  • Linear recurrence using Cayley-Hamilton theorem is \mathcal{O}(K^2log( N) + K^3)
  • Overall Complexity: \mathcal{O}(K^4+K^2log(N))

ALTERNATE SOLUTION:

We will shamelessly exploit the fact that the given linear recurrence is the Fibonacci sequence!

Let, \varphi = \frac{1 + \sqrt{5}}{2}. So, F_n = \frac{\varphi^n - (1 - \varphi)^n}{\sqrt{5}}

Now, S(N) = \sum_{i = 0}^{N}{F_{a + i \cdot d} \cdot M^i}

= \sum_{i = 0}^{N}{\frac{\varphi^{a + i \cdot d} - (1 - \varphi)^{a + i \cdot d}}{\sqrt{5}}\cdot M^i}

= \frac{1}{\sqrt{5}}\sum_{i = 0}^{N}{(\varphi^{a + i \cdot d} - (1 - \varphi)^{a + i \cdot d})\cdot M^i}

= \frac{1}{\sqrt{5}}\sum_{i = 0}^{N}{\varphi^{a + i \cdot d}\cdot M^i - \frac{1}{\sqrt{5}}(1 - \varphi)^{a + i \cdot d}\cdot M^i}

= \frac{\varphi^a}{\sqrt{5}}\sum_{i = 0}^{N}{(\varphi^{d}\cdot M)^i - \frac{(1 - \varphi)^a}{\sqrt{5}}\sum_{i = 0}^{N}{((1 - \varphi)^{d}\cdot M)^i}}

Here, \sqrt{5} doesn’t exist under the given modulo. So let us perform the calculations under the field (x + y \sqrt{5}). We can do basic arithmetics easily under this field.

So we just need to find the summation of the geometric progression of a square matrix. To do it faster than \mathcal{O}(K^3 log(N)), we need to find the characteristic polynomial of the matrix and use Cayley-Hamilton theorem to solve the problem in \mathcal{O}(K^4+K^2log(N)). You can look here or the details of the explanation of the subtask 3 to enlighten yourself.
Implementation: smash me

SOLUTIONS:

Setter's Solution for Subtask 1
#include<bits/stdc++.h>
using namespace std;

const int N = 105, mod = 998244353;

template <const int32_t MOD>
struct modint {
    int32_t value;
    modint() = default;
    modint(int32_t value_) : value(value_) {}
    inline modint<MOD> operator + (modint<MOD> other) const { int32_t c = this->value + other.value; return modint<MOD>(c >= MOD ? c - MOD : c); }
    inline modint<MOD> operator - (modint<MOD> other) const { int32_t c = this->value - other.value; return modint<MOD>(c <    0 ? c + MOD : c); }
    inline modint<MOD> operator * (modint<MOD> other) const { int32_t c = (int64_t)this->value * other.value % MOD; return modint<MOD>(c < 0 ? c + MOD : c); }
    inline modint<MOD> & operator += (modint<MOD> other) { this->value += other.value; if (this->value >= MOD) this->value -= MOD; return *this; }
    inline modint<MOD> & operator -= (modint<MOD> other) { this->value -= other.value; if (this->value <    0) this->value += MOD; return *this; }
    inline modint<MOD> & operator *= (modint<MOD> other) { this->value = (int64_t)this->value * other.value % MOD; if (this->value < 0) this->value += MOD; return *this; }
    inline modint<MOD> operator - () const { return modint<MOD>(this->value ? MOD - this->value : 0); }
    modint<MOD> pow(uint64_t k) const { modint<MOD> x = *this, y = 1; for (; k; k >>= 1) { if (k & 1) y *= x; x *= x; } return y; }
    modint<MOD> inv() const { return pow(MOD - 2); }  // MOD must be a prime
    inline modint<MOD> operator /  (modint<MOD> other) const { return *this *  other.inv(); }
    inline modint<MOD> operator /= (modint<MOD> other)       { return *this *= other.inv(); }
    inline bool operator == (modint<MOD> other) const { return value == other.value; }
    inline bool operator != (modint<MOD> other) const { return value != other.value; }
    inline bool operator < (modint<MOD> other) const { return value < other.value; }
    inline bool operator > (modint<MOD> other) const { return value > other.value; }
};
template <int32_t MOD> modint<MOD> operator * (int64_t value, modint<MOD> n) { return modint<MOD>(value) * n; }
template <int32_t MOD> modint<MOD> operator * (int32_t value, modint<MOD> n) { return modint<MOD>(value % MOD) * n; }
template <int32_t MOD> istream & operator >> (istream & in, modint<MOD> &n) { return in >> n.value; }
template <int32_t MOD> ostream & operator << (ostream & out, modint<MOD> n) { return out << n.value; }

using mint = modint<mod>;

vector<mint> BerlekampMassey(vector<mint> S) {
    int n = (int)S.size(), L = 0, m = 0;
    vector<mint> C(n), B(n), T;
    C[0] = B[0] = 1;
    mint b = 1;
    for(int i = 0; i < n; i++) {
        ++m; mint d = S[i];
        for(int j = 1; j <= L; j++) d += C[j] * S[i - j];
        if (d == 0) continue;
        T = C; mint coef = d * b.inv();
        for(int j = m; j < n; j++) C[j] -= coef * B[j - m];
        if (2 * L > i) continue;
        L = i + 1 - L; B = T; b = d; m = 0;
    }
    C.resize(L + 1); C.erase(C.begin());
    for(auto &x: C)  x *= -1;
    return C;
}
struct Mat {
    int n, m;
    vector< vector<mint> > a;
    Mat() { }
    Mat(int _n, int _m) {n = _n; m = _m; a.assign(n, vector<mint>(m, 0)); }
    Mat(vector< vector<mint> > v) { n = v.size(); m = n ? v[0].size() : 0; a = v; }
    inline void make_unit() {
        assert(n == m);
        for (int i = 0; i < n; i++)  {
            for (int j = 0; j < n; j++) a[i][j] = (i == j);
        }
    }
    inline Mat operator + (const Mat &b) {
        assert(n == b.n && m == b.m);
        Mat ans = Mat(n, m);
        for(int i = 0; i < n; i++) {
            for(int j = 0; j < m; j++) {
                ans.a[i][j] = a[i][j] + b.a[i][j];
            }
        }
        return ans;
    }   
    inline Mat operator - (const Mat &b) {
        assert(n == b.n && m == b.m);
        Mat ans = Mat(n, m);
        for(int i = 0; i < n; i++) {
            for(int j = 0; j < m; j++) {
                ans.a[i][j] = a[i][j] - b.a[i][j];
            }
        }
        return ans;
    }
    inline Mat operator * (const Mat &b) {
        assert(m == b.n);
        Mat ans = Mat(n, b.m);
        for(int i = 0; i < n; i++) {
            for(int j = 0; j < b.m; j++) {
                for(int k = 0; k < m; k++) {
                    ans.a[i][j] += a[i][k] * b.a[k][j];
                }
            }
        }
        return ans;
    }   
    inline Mat operator * (mint k) {
        Mat ans = *this;
        for (int i = 0; i < n; i++) {
            for (int j = 0; j < m; j++) {
                ans.a[i][j] = ans.a[i][j] * k;
            }
        }
        return ans;
    }
    inline Mat pow(long long k) {
        assert(n == m);
        Mat ans(n, n), t = a; ans.make_unit();
        while (k) {
            if (k & 1) ans = ans * t;
            t = t * t;
            k >>= 1;
        }
        return ans;
    }
    inline Mat& operator += (const Mat& b) { return *this = (*this) + b; }
    inline Mat& operator -= (const Mat& b) { return *this = (*this) - b; }
    inline Mat& operator *= (const Mat& b) { return *this = (*this) * b; }
    inline bool operator == (const Mat& b) { return a == b.a; }
    inline bool operator != (const Mat& b) { return a != b.a; }
    void print() {
        cout << "Matrix: \n";
        for (int i = 0; i < n; i++) {
            for (int j = 0; j < m; j++) cout << a[i][j].value << ' ';
            cout << '\n';
        }
        cout << '\n';
    }
};
mint fib(long long n) {
    //assert (3 * pow(mod, 2) < pow(2, 63));
    assert (n >= 0);
    if (n <= 1) return n;
    int a = 0;
    int b = 1;
    long long i = 1ll << (63 - __builtin_clzll(n) - 1);
    for (; i; i >>= 1) {
        int na = (a *(long long) a + b *(long long) b) % mod;
        int nb = (2ll * a + b) * b % mod;
        a = na;
        b = nb;
        if (n & i) {
            int c = a + b; if (c >= mod) c -= mod;
            a = b;
            b = c;
        }
    }
    return b;
}
// subtask 1
int32_t main() {
  ios_base::sync_with_stdio(0);
  cin.tie(0);
  int t; cin >> t;
  assert(1 <= t && t <= 100);
  while (t--) {
    int k; long long st, d, n; cin >> k >> st >> d >> n;
    assert(k == 1);
    assert(0 <= st && st <= 1e9);
    assert(0 <= d && d <= 1e9);
    assert(0 <= n && n <= 1e18);
    mint x; cin >> x.value;
    assert(x.value >= 0 && x < mod);
    vector<mint> S;
    for (int i = 0; i < 10; i++) {
      S.push_back(fib(st + d * i));
    }
    auto tr = BerlekampMassey(S);
    assert(tr.size() <= 2);
    tr.resize(2);
    Mat M(3, 3);
    M.a[0][0] = tr[0] * x;
    M.a[0][1] = tr[1] * x * x;
    M.a[1][0] = M.a[2][0] = M.a[2][2] = 1;
    auto P = M.pow(n);
    cout << P.a[2][0] * fib(st + d) * x + P.a[2][1] * fib(st) + P.a[2][2] * fib(st) << '\n';
  }
  return 0;
}
Setter's Solution for Subtask 2
#include<bits/stdc++.h>
using namespace std;

const int N = 105, mod = 998244353;

template <const int32_t MOD>
struct modint {
    int32_t value;
    modint() = default;
    modint(int32_t value_) : value(value_) {}
    inline modint<MOD> operator + (modint<MOD> other) const { int32_t c = this->value + other.value; return modint<MOD>(c >= MOD ? c - MOD : c); }
    inline modint<MOD> operator - (modint<MOD> other) const { int32_t c = this->value - other.value; return modint<MOD>(c <    0 ? c + MOD : c); }
    inline modint<MOD> operator * (modint<MOD> other) const { int32_t c = (int64_t)this->value * other.value % MOD; return modint<MOD>(c < 0 ? c + MOD : c); }
    inline modint<MOD> & operator += (modint<MOD> other) { this->value += other.value; if (this->value >= MOD) this->value -= MOD; return *this; }
    inline modint<MOD> & operator -= (modint<MOD> other) { this->value -= other.value; if (this->value <    0) this->value += MOD; return *this; }
    inline modint<MOD> & operator *= (modint<MOD> other) { this->value = (int64_t)this->value * other.value % MOD; if (this->value < 0) this->value += MOD; return *this; }
    inline modint<MOD> operator - () const { return modint<MOD>(this->value ? MOD - this->value : 0); }
    modint<MOD> pow(uint64_t k) const { modint<MOD> x = *this, y = 1; for (; k; k >>= 1) { if (k & 1) y *= x; x *= x; } return y; }
    modint<MOD> inv() const { return pow(MOD - 2); }  // MOD must be a prime
    inline modint<MOD> operator /  (modint<MOD> other) const { return *this *  other.inv(); }
    inline modint<MOD> operator /= (modint<MOD> other)       { return *this *= other.inv(); }
    inline bool operator == (modint<MOD> other) const { return value == other.value; }
    inline bool operator != (modint<MOD> other) const { return value != other.value; }
    inline bool operator < (modint<MOD> other) const { return value < other.value; }
    inline bool operator > (modint<MOD> other) const { return value > other.value; }
};
template <int32_t MOD> modint<MOD> operator * (int64_t value, modint<MOD> n) { return modint<MOD>(value) * n; }
template <int32_t MOD> modint<MOD> operator * (int32_t value, modint<MOD> n) { return modint<MOD>(value % MOD) * n; }
template <int32_t MOD> istream & operator >> (istream & in, modint<MOD> &n) { return in >> n.value; }
template <int32_t MOD> ostream & operator << (ostream & out, modint<MOD> n) { return out << n.value; }

using mint = modint<mod>;

vector<mint> BerlekampMassey(vector<mint> S) {
    int n = (int)S.size(), L = 0, m = 0;
    vector<mint> C(n), B(n), T;
    C[0] = B[0] = 1;
    mint b = 1;
    for(int i = 0; i < n; i++) {
        ++m; mint d = S[i];
        for(int j = 1; j <= L; j++) d += C[j] * S[i - j];
        if (d == 0) continue;
        T = C; mint coef = d * b.inv();
        for(int j = m; j < n; j++) C[j] -= coef * B[j - m];
        if (2 * L > i) continue;
        L = i + 1 - L; B = T; b = d; m = 0;
    }
    C.resize(L + 1); C.erase(C.begin());
    for(auto &x: C)  x *= -1;
    return C;
}
struct Mat {
    int n, m;
    vector< vector<mint> > a;
    Mat() { }
    Mat(int _n, int _m) {n = _n; m = _m; a.assign(n, vector<mint>(m, 0)); }
    Mat(vector< vector<mint> > v) { n = v.size(); m = n ? v[0].size() : 0; a = v; }
    inline void make_unit() {
        assert(n == m);
        for (int i = 0; i < n; i++)  {
            for (int j = 0; j < n; j++) a[i][j] = (i == j);
        }
    }
    inline Mat operator + (const Mat &b) {
        assert(n == b.n && m == b.m);
        Mat ans = Mat(n, m);
        for(int i = 0; i < n; i++) {
            for(int j = 0; j < m; j++) {
                ans.a[i][j] = a[i][j] + b.a[i][j];
            }
        }
        return ans;
    }   
    inline Mat operator - (const Mat &b) {
        assert(n == b.n && m == b.m);
        Mat ans = Mat(n, m);
        for(int i = 0; i < n; i++) {
            for(int j = 0; j < m; j++) {
                ans.a[i][j] = a[i][j] - b.a[i][j];
            }
        }
        return ans;
    }
    inline Mat operator * (const Mat &b) {
        assert(m == b.n);
        Mat ans = Mat(n, b.m);
        for(int i = 0; i < n; i++) {
            for(int j = 0; j < b.m; j++) {
                for(int k = 0; k < m; k++) {
                    ans.a[i][j] += a[i][k] * b.a[k][j];
                }
            }
        }
        return ans;
    }   
    inline Mat operator * (mint k) {
        Mat ans = *this;
        for (int i = 0; i < n; i++) {
            for (int j = 0; j < m; j++) {
                ans.a[i][j] = ans.a[i][j] * k;
            }
        }
        return ans;
    }
    inline Mat pow(long long k) {
        assert(n == m);
        Mat ans(n, n), t = a; ans.make_unit();
        while (k) {
            if (k & 1) ans = ans * t;
            t = t * t;
            k >>= 1;
        }
        return ans;
    }
    inline Mat& operator += (const Mat& b) { return *this = (*this) + b; }
    inline Mat& operator -= (const Mat& b) { return *this = (*this) - b; }
    inline Mat& operator *= (const Mat& b) { return *this = (*this) * b; }
    inline bool operator == (const Mat& b) { return a == b.a; }
    inline bool operator != (const Mat& b) { return a != b.a; }
    void print() {
        cout << "Matrix: \n";
        for (int i = 0; i < n; i++) {
            for (int j = 0; j < m; j++) cout << a[i][j].value << ' ';
            cout << '\n';
        }
        cout << '\n';
    }
};
mint fib(long long n) {
    //assert (3 * pow(mod, 2) < pow(2, 63));
    assert (n >= 0);
    if (n <= 1) return n;
    int a = 0;
    int b = 1;
    long long i = 1ll << (63 - __builtin_clzll(n) - 1);
    for (; i; i >>= 1) {
        int na = (a *(long long) a + b *(long long) b) % mod;
        int nb = (2ll * a + b) * b % mod;
        a = na;
        b = nb;
        if (n & i) {
            int c = a + b; if (c >= mod) c -= mod;
            a = b;
            b = c;
        }
    }
    return b;
}
// subtask 2
int32_t main() {
  ios_base::sync_with_stdio(0);
  cin.tie(0);
  int t; cin >> t;
  assert(1 <= t && t <= 100);
  while (t--) {
    int k; long long st, d, n; cin >> k >> st >> d >> n;
    assert(1 <= k && k <= 20);
    assert(0 <= st && st <= 1e9);
    assert(0 <= d && d <= 1e9);
    assert(0 <= n && n <= 1e18);
    Mat x(k, k);
    for (int i = 0; i < k; i++) {
        for (int j = 0; j < k; j++) {
            cin >> x.a[i][j];
            assert(x.a[i][j].value >= 0 && x.a[i][j] < mod);
        }
    }
    vector<mint> S;
    for (int i = 0; i < 10; i++) {
      S.push_back(fib(st + d * i));
    }
    auto tr = BerlekampMassey(S);
    assert(tr.size() <= 2);
    tr.resize(2);
    Mat M(3 * k, 3 * k);
    auto tmp = x * tr[0];
    for (int i = 0; i < k; i++) {
        for (int j = 0; j < k; j++) {
            M.a[i][j] = tmp.a[i][j];
        }
    }    
    tmp = x * x * tr[1];
    for (int i = 0; i < k; i++) {
        for (int j = 0; j < k; j++) {
            M.a[i][j + k] = tmp.a[i][j];
        }
    }
    for (int i = 0; i < k; i++) {
        M.a[k + i][i] = 1;
    }    
    for (int i = 0; i < k; i++) {
        M.a[k * 2 + i][i] = 1;
    }    
    for (int i = 0; i < k; i++) {
        M.a[k * 2 + i][i + 2 * k] = 1;
    }
    auto P = M.pow(n);
    Mat A(k, 3 * k);
    for (int i = 0; i < k; i++) {
        for (int j = 0; j < 3 * k; j++) {
            A.a[i][j] = P.a[i + 2 * k][j];
        }
    }
    Mat B(3 * k, k);
    tmp = x * fib(st + d);
    for (int i = 0; i < k; i++) {
        for (int j = 0; j < k; j++) {
            B.a[i][j] = tmp.a[i][j];
        }
    }   
    for (int i = 0; i < k; i++) {
        B.a[i + k][i] = fib(st);
    }    
    for (int i = 0; i < k; i++) {
        B.a[i + 2 * k][i] = fib(st);
    }
    auto ans = A * B;
    for (int i = 0; i < k; i++) {
        for (int j = 0; j < k; j++) {
            cout << ans.a[i][j] << ' ';
        }
        cout << '\n';
    }
  }
  return 0;
}
Setter's Solution, Full Score
#include<bits/stdc++.h>
using namespace std;

const int N = 505, mod = 998244353;

struct BigInt {
    vector<int> d;
    BigInt() {}
    BigInt(string s) {
        for (int i = (int)s.size(); i --; ) {
            d.push_back(s[i] - '0');
        }
    }
    int val() {
        int ret = 0;
        for (int i = (int)d.size(); i --; ) {
            ret = (ret * 10 + d[i]);
        }
        return ret;
    }
    void relax() {
        d.push_back(0);
        for (int i = 0; i + 1 < d.size(); i ++) {
            d[i + 1] += d[i] / 10, d[i] %= 10; 
        }
        while (!d.back() and d.size() > 1) {
            d.pop_back();
        }
    }
    void plus() {
        d[0] ++;
        relax();
    }
    void div2() {
        d[0] /= 2;
        for (int i = 1; i < d.size(); i ++) {
            d[i - 1] += 5 * (d[i] % 2), d[i] /= 2;
        }
        while (!d.back() and d.size() > 1) {
            d.pop_back();
        }
    }
    bool is_zero() {
        return d.size() == 1 && d.back() == 0;
    }
    int mod2() {
        return d[0] % 2;
    }
};
template <const int32_t MOD>
struct modint {
    int32_t value;
    modint() = default;
    modint(int32_t value_) : value(value_) {}
    inline modint<MOD> operator + (modint<MOD> other) const { int32_t c = this->value + other.value; return modint<MOD>(c >= MOD ? c - MOD : c); }
    inline modint<MOD> operator - (modint<MOD> other) const { int32_t c = this->value - other.value; return modint<MOD>(c <    0 ? c + MOD : c); }
    inline modint<MOD> operator * (modint<MOD> other) const { int32_t c = (int64_t)this->value * other.value % MOD; return modint<MOD>(c < 0 ? c + MOD : c); }
    inline modint<MOD> & operator += (modint<MOD> other) { this->value += other.value; if (this->value >= MOD) this->value -= MOD; return *this; }
    inline modint<MOD> & operator -= (modint<MOD> other) { this->value -= other.value; if (this->value <    0) this->value += MOD; return *this; }
    inline modint<MOD> & operator *= (modint<MOD> other) { this->value = (int64_t)this->value * other.value % MOD; if (this->value < 0) this->value += MOD; return *this; }
    inline modint<MOD> operator - () const { return modint<MOD>(this->value ? MOD - this->value : 0); }
    modint<MOD> pow(uint64_t k) const { modint<MOD> x = *this, y = 1; for (; k; k >>= 1) { if (k & 1) y *= x; x *= x; } return y; }
    modint<MOD> inv() const { return pow(MOD - 2); }  // MOD must be a prime
    inline modint<MOD> operator /  (modint<MOD> other) const { return *this *  other.inv(); }
    inline modint<MOD> operator /= (modint<MOD> other)       { return *this *= other.inv(); }
    inline bool operator == (modint<MOD> other) const { return value == other.value; }
    inline bool operator != (modint<MOD> other) const { return value != other.value; }
    inline bool operator < (modint<MOD> other) const { return value < other.value; }
    inline bool operator > (modint<MOD> other) const { return value > other.value; }
};
template <int32_t MOD> modint<MOD> operator * (int64_t value, modint<MOD> n) { return modint<MOD>(value) * n; }
template <int32_t MOD> modint<MOD> operator * (int32_t value, modint<MOD> n) { return modint<MOD>(value % MOD) * n; }
template <int32_t MOD> istream & operator >> (istream & in, modint<MOD> &n) { return in >> n.value; }
template <int32_t MOD> ostream & operator << (ostream & out, modint<MOD> n) { return out << n.value; }

using mint = modint<mod>;

struct Mat {
    int n, m;
    vector< vector<mint> > a;
    Mat() { }
    Mat(int _n, int _m) {n = _n; m = _m; a.assign(n, vector<mint>(m, 0)); }
    Mat(vector< vector<mint> > v) { n = v.size(); m = n ? v[0].size() : 0; a = v; }
    inline void make_unit() {
        assert(n == m);
        for (int i = 0; i < n; i++)  {
            for (int j = 0; j < n; j++) a[i][j] = (i == j);
        }
    }
    inline Mat operator + (const Mat &b) {
        assert(n == b.n && m == b.m);
        Mat ans = Mat(n, m);
        for(int i = 0; i < n; i++) {
            for(int j = 0; j < m; j++) {
                ans.a[i][j] = a[i][j] + b.a[i][j];
            }
        }
        return ans;
    }   
    inline Mat operator - (const Mat &b) {
        assert(n == b.n && m == b.m);
        Mat ans = Mat(n, m);
        for(int i = 0; i < n; i++) {
            for(int j = 0; j < m; j++) {
                ans.a[i][j] = a[i][j] - b.a[i][j];
            }
        }
        return ans;
    }
    inline Mat operator * (const Mat &b) {
        assert(m == b.n);
        Mat ans = Mat(n, b.m);
        for(int i = 0; i < n; i++) {
            for(int j = 0; j < b.m; j++) {
                for(int k = 0; k < m; k++) {
                    ans.a[i][j] += a[i][k] * b.a[k][j];
                }
            }
        }
        return ans;
    }   
    inline Mat operator * (mint k) {
        Mat ans = *this;
        for (int i = 0; i < n; i++) {
            for (int j = 0; j < m; j++) {
                ans.a[i][j] = ans.a[i][j] * k;
            }
        }
        return ans;
    }
    inline Mat pow(long long k) {
        assert(n == m);
        Mat ans(n, n), t = a; ans.make_unit();
        while (k) {
            if (k & 1) ans = ans * t;
            t = t * t;
            k >>= 1;
        }
        return ans;
    }
    inline Mat& operator += (const Mat& b) { return *this = (*this) + b; }
    inline Mat& operator -= (const Mat& b) { return *this = (*this) - b; }
    inline Mat& operator *= (const Mat& b) { return *this = (*this) * b; }
    inline bool operator == (const Mat& b) { return a == b.a; }
    inline bool operator != (const Mat& b) { return a != b.a; }
    void print() {
        cout << "Matrix: \n";
        for (int i = 0; i < n; i++) {
            for (int j = 0; j < m; j++) cout << a[i][j].value << ' ';
            cout << '\n';
        }
        cout << '\n';
    }
};
mint fib(long long n) {
    //assert (3 * pow(mod, 2) < pow(2, 63));
    assert (n >= 0);
    if (n <= 1) return n;
    int a = 0;
    int b = 1;
    long long i = 1ll << (63 - __builtin_clzll(n) - 1);
    for (; i; i >>= 1) {
        int na = (a *(long long) a + b *(long long) b) % mod;
        int nb = (2ll * a + b) * b % mod;
        a = na;
        b = nb;
        if (n & i) {
            int c = a + b; if (c >= mod) c -= mod;
            a = b;
            b = c;
        }
    }
    return b;
}

vector<mint> BerlekampMassey(vector<mint> S) {
    int n = (int)S.size(), L = 0, m = 0;
    vector<mint> C(n), B(n), T;
    C[0] = B[0] = 1;
    mint b = 1;
    for(int i = 0; i < n; i++) {
        ++m; mint d = S[i];
        for(int j = 1; j <= L; j++) d += C[j] * S[i - j];
        if (d == 0) continue;
        T = C; mint coef = d * b.inv();
        for(int j = m; j < n; j++) C[j] -= coef * B[j - m];
        if (2 * L > i) continue;
        L = i + 1 - L; B = T; b = d; m = 0;
    }
    C.resize(L + 1); C.erase(C.begin());
    for(auto &x: C)  x *= -1;
    return C;
}
mt19937 rnd(chrono::steady_clock::now().time_since_epoch().count());;

Mat pw[N], pref[N];
vector<mint> MatrixRecurrence(int n, Mat a) {
    vector<mint> rd1, rd2;
    for (int i = 0; i < n; i++) {
        rd1.push_back(1 + rnd() % (mod - 1));
        rd2.push_back(1 + rnd() % (mod - 1));
    }
    vector<mint> v;
    for (int k = 0; k < 4 * n + 10; k++) {
        vector<mint> sum(n);
        for (int i = 0; i < n; i++) {
            for (int j = 0; j < n; j++) {
                sum[i] += pref[k].a[i][j] * rd2[j]; 
            }
        }
        mint tmp = 0;
        for (int j = 0; j < n; j++) tmp += sum[j] * rd1[j];
        v.push_back(tmp);
    }
    auto ans = BerlekampMassey(v);
    return ans;
}
bool verify(int n, Mat a, vector<mint> p) {
    int k = 4 * n + 20;
    Mat ans(n, n); int sz = p.size();
    for (int i = 0; i < sz; i++) {
        ans += pref[k - i - 1] * p[i];
    }
    if (ans != pref[k]) return 0;
    return 1;
}
vector<mint> combine (int n, vector<mint> &a, vector<mint> &b, vector<mint> &tr) {
    vector<mint> res(n * 2 + 1, 0);
    for (int i = 0; i < n + 1; i++) {
        for (int j = 0; j < n + 1; j++) res[i + j] += a[i] * b[j];
    }
    for (int i = 2 * n; i > n; --i) {
        for (int j = 0; j < n; j++) res[i - 1 - j] += res[i] * tr[j];
    }
    res.resize(n + 1);
    return res;
};
// transition -> for(i = 0; i < x; i++) f[n] += tr[i] * f[n-i-1]
// S contains initial values, k is 0 indexed
Mat LinearRecurrence(int z, vector<Mat> &S, vector<mint> &tr, BigInt k) {
    int n = S.size(); assert(n == (int)tr.size());
    if (n == 0) return Mat(z, z);
    if (k.d.size() <= 5 && n > k.val()) {
        return S[k.val()];
    }
    vector<mint> pol(n + 1), e(pol);
    pol[0] = e[1] = 1;
    for (k.plus(); !k.is_zero(); k.div2()) {
        if (k.mod2() == 1) pol = combine(n, pol, e, tr);
        e = combine(n, e, e, tr);
    }
    Mat res = Mat(z, z);
    for (int i = 0; i < n; i++) res += S[i] * pol[i + 1];
    return res;
}
int main() {
    ios_base::sync_with_stdio(0);
    cin.tie(0);
    int t; cin >> t;
    while (t--) {
        int k; long long st, d; string nn;
        cin >> k >> st >> d >> nn;
        BigInt n(nn);
        Mat a(k, k);
        for (int i = 0; i < k; i++) {
            for (int j = 0; j < k; j++) {
                cin >> a.a[i][j];
            }
        }
        pw[0] = a.pow(0);
        for (int i = 1; i <= 4 * k + 20; i++) {
            pw[i] = pw[i - 1] * a;
        }
        pref[0] = pw[0] * fib(st);
        for (int i = 1; i <= 4 * k + 20; i++) {
            pref[i] = pref[i - 1] + pw[i] * fib(1LL * i * d + st);
        }
        vector<mint> p;
        do {
            p = MatrixRecurrence(k, a);
        }
        while (!verify(k, a, p));
        int sz = p.size();
        assert(sz <= 2 * k + 1);
        Mat ans(k, k);
        vector<Mat> S;
        for (int i = 0; i < sz; i++) {
            S.push_back(pref[i]);
        }
        ans = LinearRecurrence(k, S, p, n);
        for (int i = 0; i < k; i++) {
            for (int j = 0; j < k; j++) {
                cout << ans.a[i][j] << ' ';
            }
            cout << '\n';
        }
    }
    return 0;
}
Tester's Solution
#include <bits/stdc++.h>
 
#define all(x) (x).begin(), (x).end()
#define rall(x) (x).rbegin(), (x).rend()
#define forn(i, n) for (int i = 0; i < (int)(n); ++i)
#define for1(i, n) for (int i = 1; i <= (int)(n); ++i)
#define ford(i, n) for (int i = (int)(n) - 1; i >= 0; --i)
#define fore(i, a, b) for (int i = (int)(a); i <= (int)(b); ++i)
 
template<class T> bool umin(T &a, T b) { return a > b ? (a = b, true) : false; }
template<class T> bool umax(T &a, T b) { return a < b ? (a = b, true) : false; }
 
using namespace std;
 
 
long long readInt(long long l, long long r, char endd) {
	long long x = 0;
	int cnt = 0;
	int fi = -1;
	bool is_neg = false;
	while (true) {
		char g = getchar();
		if (g == '-') {
			assert(fi == -1);
			is_neg = true;
			continue;
		}
		if ('0' <= g && g <= '9') {
			x *= 10;
			x += g - '0';
			if (cnt == 0) {
				fi = g - '0';
			}
			cnt++;
			assert(fi != 0 || cnt == 1);
			assert(fi != 0 || is_neg == false);
 
			assert(!(cnt > 19 || (cnt == 19 && fi > 1)));
		}
		else if (g == endd) {
			assert(cnt > 0);
			if (is_neg) {
				x = -x;
			}
			assert(l <= x && x <= r);
			return x;
		}
		else {
//			assert(false);
		}
	}
}
 
string readString(int l, int r, char endd) {
	string ret = "";
	int cnt = 0;
	while (true) {
		char g = getchar();
		if(g=='\r')
			continue;
		assert(g != -1);
		if (g == endd) {
			break;
		}
		cnt++;
		ret += g;
	}
	assert(l <= cnt && cnt <= r);
	return ret;
}
long long readIntSp(long long l, long long r) {
	return readInt(l, r, ' ');
}
long long readIntLn(long long l, long long r) {
	return readInt(l, r, '\n');
}
string readStringLn(int l, int r) {
	return readString(l, r, '\n');
}
string readStringSp(int l, int r) {
	return readString(l, r, ' ');
}
 
 
//PANIC
const uint32_t mod = 998244353;
 
std::mt19937 rng(0x15c51);
uint32_t getRandom(uint32_t from, uint32_t to)
{
	std::uniform_int_distribution<uint32_t> distribution(from, to);
	return distribution(rng);
}
 
uint64_t powMod(uint64_t a, uint64_t pw)
{
	uint64_t res(1);
	while (pw)
	{
		if (pw & 1)
		{
			res = (res * a) % mod;
		}
		pw >>= 1;
		a = (a*a) % mod;
	}
	return res;
}
 
uint64_t inverse(uint64_t a)
{
	return powMod(a, mod - 2);
}
 
inline vector<uint32_t> BerlekampMassey(const vector<uint32_t>& x)
{
	//ls: (shortest) relation sequence (after filling zeroes) so far
	//cur: current relation sequence
	vector<uint32_t> ls, cur;
	//lf: the position of ls (t')
	//ld: delta of ls (v')
	uint32_t lf, ld;
	forn(i, x.size())
	{
		uint64_t t = 0;
		//evaluate at position i
		forn(j, cur.size())
		{
			t = (t + x[i - j - 1] * static_cast<int64_t>(cur[j])) % mod;
		}
 
		if (t == x[i])
			continue; //good so far
		//first non-zero position
		if (!cur.size())
		{
			cur.resize(i + 1);
			lf = i; ld = (t + mod - x[i]) % mod;
			continue;
		}
		//cur=cur-c/ld*(x[i]-t)
		uint64_t k = (t - x[i] + mod)*inverse(ld) % mod;
		vector<uint32_t> c(i - lf - 1); //add zeroes in front
		c.push_back(k);
		forn(j, ls.size())
			c.push_back(k * (mod-ls[j]) % mod);
		if (c.size() < cur.size())
			c.resize(cur.size());
		for (int j = 0; j<int(cur.size()); ++j)
			c[j] = (c[j] + cur[j]) % mod;
		//if cur is better than ls, change ls to cur
		if (i - lf + (int)ls.size() >= (int)cur.size())
			ls = cur, lf = i, ld = (t + mod - x[i]) % mod;
		cur = c;
	}
	return cur;
}
 
uint32_t addMod(uint32_t a, uint32_t b)
{
	uint32_t res = a + b;
	if (res >= mod)
		res -= mod;
	return res;
}
 
uint32_t mulMod(uint32_t a, uint32_t b)
{
	return static_cast<uint32_t>((static_cast<uint64_t>(a)*b) % mod);
}
 
uint32_t mulMod2(uint32_t a, uint32_t b)
{
	return static_cast<uint32_t>((static_cast<uint64_t>(a)*b*5) % mod);
}
 
struct fibonacci_t {//1+sqrt(5)
	uint32_t a;
	uint32_t b;
 
	fibonacci_t(): a(0), b(0) {}
	fibonacci_t(uint32_t _a, uint32_t _b) :
		a(_a), b(_b){}
 
	fibonacci_t operator+(const fibonacci_t& o)  const
	{
		return fibonacci_t(addMod(a,o.a), addMod(b,o.b));
	}
 
	fibonacci_t operator*(uint32_t c) const
	{
		return fibonacci_t(mulMod(c, a), mulMod(c,b));
	}
 
	fibonacci_t operator*(const fibonacci_t& o) const
	{
		uint32_t ra = addMod(mulMod(a, o.a), mulMod2(b, o.b));
		uint32_t rb = addMod(mulMod(a, o.b), mulMod(b, o.a));
		return fibonacci_t(ra, rb);
	}
 
	fibonacci_t pow(uint32_t pw) const
	{
		fibonacci_t tmp(a,b);
		fibonacci_t res(1, 0);
		while (pw)
		{
			if (pw & 1)
			{
				res = res * tmp;
			}
			pw >>= 1;
			tmp = tmp * tmp;
		}
		return res;
	}
};
 
struct matrix_t {
	static uint32_t dim;
	static vector<uint32_t> leftRandom;
	static vector<uint32_t> rightRandom;
	typedef vector<vector<uint32_t>> data_t;
	data_t data;
 
	matrix_t()
	{
		data = data_t(dim, vector<uint32_t>(dim));
	}
 
	matrix_t(bool identity)
	{
		data = data_t(dim, vector<uint32_t>(dim));
		if (identity)
		{
			forn(i, dim)
				data[i][i] = 1;
		}
	}
 
	matrix_t(const data_t& d) : data(d) {}
	matrix_t(const matrix_t& m) : data(m.data) {}
 
	static void init(uint32_t _dim)
	{
		dim = _dim;
		leftRandom.resize(dim);
		rightRandom.resize(dim);
		for (auto& lf : leftRandom)
			lf = getRandom(2, mod - 2);
		for (auto& rg : rightRandom)
			rg = getRandom(2, mod - 2);
	}
 
	matrix_t operator+(const matrix_t& o) const
	{
		matrix_t res;
		forn(i, dim) forn(j, dim) res.data[i][j] = (data[i][j] + o.data[i][j]) % mod;
		return res;
	}
 
	matrix_t operator*(const matrix_t& o) const
	{
		matrix_t res;
		forn(i, dim) forn(j, dim)
		{
			forn(k, dim)
			{
				res.data[i][j] = (res.data[i][j] + static_cast<uint64_t>(data[i][k]) * o.data[k][j]) % mod;
			}
		}
		return res;
	}
 
	matrix_t operator*(const uint32_t c) const
	{
		matrix_t res;
		forn(i, dim) forn(j, dim)
		{
			res.data[i][j] = (data[i][j] * static_cast<uint64_t>(c))%mod;
		}
		return res;
	}
 
	uint32_t projection() const
	{
		vector<uint32_t> leftM(dim);
		forn(i, dim)
		{
			forn(j, dim)
			{
				leftM[i] = (leftM[i] + static_cast<uint64_t>(data[i][j]) * leftRandom[j]) % mod;
			}
		}
		uint32_t res = 0;
		forn(i, dim)
		{
			res = (res + static_cast<uint64_t>(rightRandom[i]) * leftM[i]) % mod;
		}
		return res;
	}
 
	void print() const
	{
		forn(i, dim)
		{
			forn(j, dim)
			{
				cout << data[i][j] << ' ';
			}
			cout << '\n';
		}
	}
};
 
uint32_t matrix_t::dim;
vector<uint32_t> matrix_t::leftRandom;
vector<uint32_t> matrix_t::rightRandom;
 
//0 is the least significant
 
void normalize(vector<uint32_t>& a, const vector<uint32_t>& recurr)
{
	if (a.size() < recurr.size())
		return;
	auto K = recurr.size();
	while (a.size() > recurr.size())
	{
		uint64_t ac = a.back();
		forn(i, K)
		{
			size_t cp = a.size() - K - 1 + i;
			a[cp] = (a[cp] + ac * recurr[K-1-i]) % mod;
		}
		a.pop_back();
	}
	while (a.size() && a.back() == 0)
		a.pop_back();
}
 
vector<uint32_t> polMul(const vector<uint32_t>& a, const vector<uint32_t>& b)
{
	if (a.empty() || b.empty())
		return vector<uint32_t>();
	vector<uint32_t> res(a.size()+ b.size()-1);
	forn(i,a.size()) forn(j, b.size())
	{
		res[i+j] = (res[i+j] + static_cast<uint64_t>(a[i]) * b[j]) % mod;
	}
	return res;
}
	
int main(int argc, char** argv) 
{
#ifdef HOME
	if(IsDebuggerPresent())
	{
		freopen("../PANIC_19.in", "rb", stdin);
		freopen("../out.txt", "wb", stdout);
	}
#endif
	int T = readIntLn(1, 100);
	forn(tc, T)
	{
		int K = readIntSp(1, 100);
		int a = readIntSp(0, 1'000'000'000);
		int d = readIntSp(0, 1'000'000'000);
		int NMAXLEN = 1001;
		string N = readStringLn(1, NMAXLEN);
		assert(!N.empty());
		assert(N[0] != '0' || N.size() == 1);
		forn(i,N.size())
			assert(N[i] >= '0' && N[i] <= '9');
		if (N.size() == NMAXLEN)
		{
			assert(N[0] == '1');
			fore(i,1,NMAXLEN-1)
				assert(N[i] == '0');
		}
		matrix_t::init(K);
 
		vector<vector<uint32_t> > inData(K, vector<uint32_t>(K));
		forn(i, K) forn(j, K)
		{
			if (j + 1 == K)
				inData[i][j] = readIntLn(0, mod - 1);
			else
				inData[i][j] = readIntSp(0, mod - 1);
		}
		
		vector<matrix_t> w;
		w.emplace_back(matrix_t(true));
		w.emplace_back(matrix_t(inData));
		while (w.size() < 4 * K + 2)
		{
			matrix_t nm = w.back() * w[1];
			w.emplace_back(nm);
		}
 
		//calculate step 
		fibonacci_t step = fibonacci_t(inverse(2), inverse(2)).pow(d);
		fibonacci_t start = fibonacci_t(inverse(2), inverse(2)).pow(a);
 
		{
			fibonacci_t act = start;
			forn(i, w.size())
			{
				w[i] = w[i] * ((act.b+act.b)%mod);
				act = act * step;
			}
			forn(i, w.size())
			{
				if (i)
				{
					w[i] = w[i] + w[i - 1];
				}
			}
		}
		vector<uint32_t> wp(w.size());
 		forn(i, wp.size())
 			wp[i] = w[i].projection();
		
		vector<uint32_t> bm = BerlekampMassey(wp);
		vector<uint32_t> act, curr(1,1);
		reverse(N.begin(), N.end());
		vector<uint32_t> actP(2);
		actP[1] = 1;
		forn(ni, N.size())
		{
			auto actP2 = polMul(actP, actP);
			normalize(actP2, bm);
			auto actP4 = polMul(actP2, actP2);
			normalize(actP4, bm);
			auto actP8 = polMul(actP4, actP4);
			normalize(actP8, bm);
			uint8_t actN = N[ni]-'0';
			if (actN >= 8)
			{
				curr = polMul(curr, actP8);
				normalize(curr, bm);
				actN -= 8;
			}
				
			if (actN >= 4)
			{
				curr = polMul(curr, actP4);
				normalize(curr, bm);
				actN -= 4;
			}
 
			if (actN >= 2)
			{
				curr = polMul(curr, actP2);
				normalize(curr, bm);
				actN -= 2;
			}
			
			if (actN)
			{
				curr = polMul(curr, actP);
				normalize(curr, bm);
				actN--;
			}
			assert(actN == 0);
			actP = polMul(actP2, actP8);
			normalize(actP, bm);
		}
 
		matrix_t res;
		forn(i, curr.size())
		{
			res = res + (w[i] * curr[i]);
		}
		res.print();
	}
	assert(getchar() == -1);
	return 0;
}

ADDITIONAL COMMENTS:

The intended solution was to use the randomized Berlekamp-Massey Algorithm. I created this problem just to introduce this trick! But during the contest, almost all of the solutions were using the alternate one :frowning:. I wasn’t aware of this solution before the contest. I broke one of my keyboards out of rage as I missed that trivial solution. But the emphatic thing is, the intended solution will work for any linear recurrence! So, I think you should be at least aware of this randomized solution.

SWEARS:

I will never use the Fibonacci sequence as a linear recurrence in any of my future problems!

VIDEO EDITORIAL:

6 Likes

Amazing problem, turned out to be even more interesting than first appeared. Sorry about your broken keyboard lol, but calling that alternative solution “trivial” - wow! Nevertheless, very nice problem and great editorial, nice work!

2 Likes

My solution goes as follows -
Finding characteristics polynomial -
We know that it is of form det(A-xI)=0 and has degree n. We can uniquely determine all coefficients using Multi-point Evaluation by first calculation these values x=0,1,…,n and then finding the polynomial. Let’s say C(x) is the characteristics polynomial.

Due Cayley–Hamilton theorem we can replace M in given equation by x in S and find it A(x) = S(x) mod C(x). Since A(x) is a polynomial of degree n. We can then compute M^i for 0<=i<=n and just bruteforce.

Let D be the matrix for Fibonacci recurrence [[1 1] [1 0]]
Lets replace F_{a+id} by D^{a+id}.
S becomes D^a \sum_{0}^{n}{(D^{d}*x)^i}
Let B=D^d.
We can treat it as matrixes with polynomials in coefficients modulo C(x).

Then compute sum of this GP in O(\log(P)).
This is enough for the first 2 subtasks. TLEs due to a lot of matrix multiplications.

An important observation for me was that Fibonacci numbers modulo mod are periodic. For prime mod, this period is O(mod). I brute forced offline to find P=1996488708 is the period given modulo.

Now one can abuse the fact that B^P=I. Let N=P*Q+R. We can divide this sum into two parts such that S= A* ((B^0+B^1+...+B^{P-1})*(1+x^1+x^2+...+x^{Q-1}) + (B^0+B^1+...+B^{R})). This splits the last subtask into 2 instances of 2nd subtask.

3 Likes

This post was flagged by the community and is temporarily hidden.

@sjshohag Please tell the test case where my code is failing: CodeChef: Practical coding for everyone
Please tell any case of Task #16.

You can also compute the minimal polynomial of M \in F^{nxn} in O(n^3) time with constant probability by picking a random vector v, then computing the first linear dependence among {v, M v, M^2 v, M^3 v, …, M^n v} that arises. This only fails if v lies in a nontrivial eigenspace of M and M is not the identity, which happens with probability <= 1/|F| + 1/|F|^{n-1} - 1/|F|^n <= 3/4.