REPCOEFF - Editorial

iceknight1093 · August 13, 2025, 4:30pm

PROBLEM LINK:

Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4

Author: raysh07
Tester: sushil2006
Editorialist: iceknight1093

DIFFICULTY:

TBD

PREREQUISITES:

Dynamic programming, finding GCD of all subsegments

PROBLEM:

You’re given an array A and a parameter K.
You can do the following:

Choose an index i (1 \leq i \lt N), delete A_i and A_{i+1}, and insert (x\cdot A_i + y\cdot A_{i+1}) \bmod K in their place.
x and y are integers chosen by you.

Find the number of distinct reachable arrays.

EXPLANATION:

Suppose we perform several operations and reach the array B = [B_1, B_2, \ldots, B_M].
Since each operation takes two adjacent elements and replaces them with some linear combination (modulo K), each B_i will correspond to a linear combination of some contiguous array of A.

Specifically, there will be M pairs of indices (l_i, r_i) such that B_i is obtained as a result of a linear combination modulo K on the subarray A[l_i, r_i].
Of course, these M subarrays will partition the entire array A.

So, the first question we should ask is: if we have the subarray [A_l, \ldots, A_r], which values can be be reduced to?

Answer

Suppose we didn’t have the “reduce modulo K” condition.
Then, the values we can form from [A_l, \ldots, A_r] are exactly all those of the form

c_1A_l + c_2A_{l+1} + \ldots + c_{r-l+1}A_r

for some integers c_i.

It’s well-known that the set of integers that can be formed this way is exactly all the multiples of \gcd(A_l, \ldots, A_r).

Now, we need to work with this but modulo K.
Taking a number modulo K is really just adding/subtracting some multiple of K from it.
We can simply include K into the linear combination to achieve this effect, so that the final set of numbers we can reach are exactly all the multiples of
\gcd(A_l, \ldots, A_r, K) that are in [0, K).

There is one edge case: when l = r, we can’t change A_l at all.

Now that we know this, there’s a natural solution that one might think of will involve dynamic programming.
Namely, let’s define dp_i to be the number of distinct arrays that can be obtained from the first i elements, with our requisite answer being dp_N.
The transitions to this might be something like: to compute dp_i, fix j \leq i as the left endpoint of the subarray ending at i, then add dp_{j-1} \cdot c to dp_i, where c is the number of values that the subarray [A_j, \ldots, A_i] can be reduced to.

However, before we even think of speeding this up, it isn’t correct in the first place!
The reason is that many arrays will be overcounted by this procedure - a simple example is to look at A = [1, 2, 2], which can be reduced to [1, 2] via subarrays ending both at index 2 and index 3, and our above solution will count it at both which is not what we want (because this will affect future transitions, if the array was longer.)

To get around this, we change the definition of our DP slightly - define dp_i to be the number of arrays that can be first formed from elements [A_1, \ldots, A_i] (so this excludes arrays that can be formed by a smaller prefix).

We now need to modify the transitions to account for this.
So, suppose we fix j \leq i and consider the subarray A[j\ldots i].
Let g(j, i) denote the value \gcd(A_j, \ldots, A_i, K).
There are now a couple of cases.

Case 1: j = i.
In this case we can only extend any array ending at i-1 to an array ending at i.
This is always going to be valid; so we add dp_{i-1} to dp_i.

Case 2: j \lt i.
Here, abusing notation a bit, let g = g(j, i). There are \frac{K}{g} choices for the last element of the subarray (namely 0, g, 2g, 3g, \ldots, K-g), so it seems that we want to add \frac{K}{g} \cdot dp_{j-1} to dp_i.
However, not all of these are valid: specifically, if some multiple of g is possible to create at a subarray A[j\ldots i_0] where i_0 \lt i, we shouldn’t consider this element: the array we created would’ve been counted at i_0 (or before) instead.

The question now is, how many such “bad” multiples of g are there? Specifically, when is the multiple x\cdot g “bad”?
Well, by definition, that only happens if there’s some i_0 \lt i such that g(j,i_0) \mid x\cdot g.
It’s not hard to see that if such an i_0 exists, i-1 can be chosen as i_0.

So really, what we want to do is exclude all multiples of g(j, i-1).
Thus, what we want to do is add \left(\frac{K}{g(j, i)} - \frac{K}{g(j, i-1)}\right)\cdot dp_{j-1} to dp_i.
Note that j = i-1 is an edge case here, where we want to subtract 1 instead of \frac{K}{g(j, i-1)}.
To avoid division-by-zero issues, you can treat 0 as K instead.

We now have a working quadratic solution, where dp_i is defined to be

dp_{i-1} + \left(\frac{K}{g(i-1, i)} - 1\right)\cdot dp_{i-2} + \sum_{j=1}^{i-2} \left(\frac{K}{g(j, i)} - \frac{K}{g(j, i-1)}\right)\cdot dp_{j-1}

The overall answer is the sum of all dp_i.
Our next step is to optimize this.

To do that, observe that for a fixed i, most values of j aren’t really very useful.
Particularly, if g(j, i) = g(j, i-1) for some j, then such a j will contribute 0 to dp_i anyway, and so can be ignored.

So, let’s fix the index j, and try to compute all i for which it will be a useful transition.
As noted above, this is simply all those i \gt j for which g(j, i) \neq g(j, i-1).
However, there can only be \mathcal{O}(\log K) such indices since the GCD must at least halve each time it changes - and finding all change indices quickly is well known (see number 3 here, or use binary search and a range GCD data structure).

This means there are only \mathcal{O}(N\log K) transitions in total, which is now fast enough to solve the problem.

TIME COMPLEXITY:

\mathcal{O}(N\log K) per testcase.

CODE:

Editorialist's code (C++)

// #include <bits/allocator.h>
// #pragma GCC optimize("O3,unroll-loops")
// #pragma GCC target("avx2,bmi,bmi2,lzcnt,popcnt")
#include "bits/stdc++.h"
using namespace std;
using ll = long long int;
mt19937_64 RNG(chrono::high_resolution_clock::now().time_since_epoch().count());

int main()
{
    ios::sync_with_stdio(false); cin.tie(0);

    const int mod = 998244353;

    int t; cin >> t;
    while (t--) {
        int n, k; cin >> n >> k;
        vector a(n+1, 0);
        for (int i = 1; i <= n; ++i) {
            cin >> a[i];
            if (a[i] == 0) a[i] = k;
            a[i] = gcd(a[i], k);
        }

        vector trans(n+2, vector<array<int, 2>>());
        vector<array<int, 2>> seg;
        for (int i = n; i >= 1; --i) {
            vector<array<int, 2>> nseg = {{a[i], i}};
            if (i+1 <= n) nseg.push_back({gcd(a[i], a[i+1]), i+1});
            for (auto [x, y] : seg) {
                int nx = gcd(x, a[i]);
                if (nx == nseg.back()[0]) continue;
                nseg.push_back({nx, y});
            }
            seg = nseg;
            int prv = 0;
            for (auto [x, y] : seg) {
                if (y == i) trans[y].push_back({i, 1});
                else if (y == i+1) trans[y].push_back({i, k/x - 1});
                else trans[y].push_back({i, k/x - k/prv});
                prv = x;
            }
        }

        vector dp(n+1, 0);
        dp[0] = 1;
        for (int i = 1; i <= n; ++i) {
            for (auto [j, c] : trans[i])
                dp[i] = (dp[i] + 1ll*c*dp[j-1])%mod;
        }
        cout << accumulate(begin(dp)+1, end(dp), 0ll) % mod << '\n';
    }
}