PROBLEM LINK:
Author: Gaoyuan Chen
Tester: Hiroto Sekido
Editorialist: Kevin Atienza
DIFFICULTY:
HARD
PREREQUISITES:
Generating functions, fast Fourier transform, number theoretic transform, polynomial mod, polynomial division
PROBLEM:
Let (A_0, A_1, A_2, \ldots) be sequence defined by the recurrence:
A_0, A_1, \ldots, A_{k1} are given, along with the coefficients C_1, \ldots, C_k. Given N, what is A_N?
Note that in the problem, the sequence starts at A_1, not A_0, but the adjustment can be easily done and will greatly simplify this editorial. (To adjust it, simply shift (A_1, \ldots, A_k) to (A_0, \ldots, A_{k1}), and decrement N by 1).
QUICK EXPLANATION:
Let
Also, let s_{k1}, s_{k2}, \ldots s_0 be defined by the following congruence:
Then we have the following amazing identity/congruence:
Computing x^N \pmod{104857601, M(x)} can be done using fast exponentiation, fast polynomial multiplication and fast polynomial mod. Each multiplication and mod runs in O(k \log k) time, and there are \log N steps, so the algorithm runs in O(k \log k \log N) time.
Note that \pmod{104857601, M(x)} means that the coefficients of polynomials are reduced modulo 104857601, and the polynomials themselves are reduced modulo M(x). Also, note that the implementation should be really efficient! We will discuss optimizations below.
EXPLANATION:
Let’s start with the usual ways of solving linear recurrences. In the following, let’s assume that N \ge k (the case N < k is trivial!). Furthermore, all integer arithmetic is assumed to be done modulo 104857601, unless stated otherwise.
Matrix exponentiation (too slow!)
A popular way of getting the $N$th term of a linear recurrence quickly is to use matrix exponentiation. It works because of the following identity, which holds for all i (we invite the reader to verify that this works):
From this, it follows (also using associativity of matrix multiplication) that
This means that we can get the answer, A_N, by simply raising a certain k imes k matrix to the power Nk+1. One can use fast exponentiation for this!
Unfortunately, this is slow to pass even the first subtask (k \le 3000). A naïve way of implementing this takes O(k^3 \log N) time, because it takes O(k^3) time to multiply two matrices, and there are O(\log N) multiplications to be done. Even using the Strassen algorithm for multiplying matrices still takes too slow: O(k^{2.81} \log N). Using more advanced matrix multiplication techniques is not worth it because the constant factors are large, and in any case there is a lower bound of \Omega(k^2 \log k) for it, so it won’t pass the last subtask anyhow. Clearly, we need to find a better approach.
Generating functions (passes subtask 1)
We can derive a much faster solution by means of generating functions. The generating function of any sequence (not just linear recurrences) (s_0,s_1,s_2,s_3, \ldots) is the following (assuming we denote it by S(x)):
Essentially, it is a power series whose coefficients are the given sequence. Another way of looking at it is that S(x) generates the sequence (s_0, s_1, \ldots). It is easy to show that (among many other nice properties):
 two generating functions are equal if and only if the sequences they generate are identical.
 the sum of two generating functions generates the sum of the two sequences they generate. In other words, if A(x) generates (a_i) and B(x) generates (b_i), then A(x)+B(x) generates (a_i+b_i).
Now, generating functions are useful in particular for linear recurrences. For example, for our sequence (A_0, A_1, \ldots) above, let A(x) be its generating function, i.e.
Then consider the following:
We have aligned terms with the same x^i. Now, try subtracting from the first equation all the other equations. You’ll see that the terms nicely cancel out, leaving only a few terms up to x^{k1}! In other words, we have the following equality:
for some polynomial P(x) of degree < k. If we let
then we can rewrite the above equality as:
Let’s try to understand what this equality actually means. It says that the any orderk linear recurrence (A_0, A_1, \ldots) has a rational generating function \frac{P(x)}{Q(x)} such that:
 Q(x) is of degree k, and contains the coefficients of the recurrence. Additionally, the constant term is 1 (or Q(0) = 1).
 P(x) is of degree < k, and along with Q(x) encodes the initial values of the recurrence (albeit in a nonintuitive format).
By reversing the process, we see that any rational generating function \frac{P(x)}{Q(x)} such that \operatorname{deg} P < \operatorname{deg} Q and Q(0) = 1 generates an orderk linear recurrence! (where k = \operatorname{deg} Q)
Now you might be wondering, what good will this do? It’s not clear yet, but remembering that equal generating functions generate the same sequence, we have the following:
The generating functions \frac{P(x)}{Q(x)} and \frac{P(x)R(x)}{Q(x)R(x)} generate the same sequence.
By selecting R(x) such that R(0) = 1, we find that the second generating function also generates a linear recurrence. Notice that they generate the same sequence, but the denominators are different! This means that we can derive a multitude of other recurrences satisfied by the original sequence (A_0,A_1,A_2,\ldots) this way!
As an example, consider the famous Fibonacci sequence, f_0 = 0, f_1 = 1 and f_i = f_{i1} + f_{i2}. One can easily show that its generating function F(x) is equal to \frac{x}{1xx^2}. Now, let’s say we multiply \frac{1+3x+x^2}{1+3x+x^2} to it. Then we will get
Since this is still the same function, we find that the Fibonacci numbers also satisfy the following recurrence:
You can check that it indeed works!
This gives us a bit of power to manipulate the coefficients of the recurrence, so let’s try to change it in a way that’s beneficial to us. One idea is this: A_i can be expressed as an orderk linear recurrence. Now, is it possible to express A_i as an order2k linear recurrence, but using only even offsets, in other words, in terms only of A_{i2}, A_{i4}, \ldots, A_{i2k}? Because if we can do this, then we can essentially ignore half the terms of the original sequence, and we can repeat the process in O(\log N) steps until N becomes sufficiently small.
In terms of the generating function \frac{P(x)}{Q(x)}, we want to find a degreek polynomial R(x) such that Q(x)R(x) contains only the even terms. Does there exist such an R(x)? Amazingly, yes, and it is simply Q(x)! (those who know Graeffe’s method will be familiar with this) This is easily shown by grouping the odd and even terms of Q(x) and Q(x), so one easily sees that Q(x)Q(x) contains only even terms. Alternatively, one can simply show that Q(x)Q(x) is an even function.
With this new polynomial, we can essentially ignore half the sequence. The only remaining step is to compute the k new initial values, which can be done by simply walking the recurrence k times, and selecting the odd or evenoffset terms, depending on the parity of N. The walking can be done in O(k^2) time, and Q(x)Q(x) itself can be computed in O(k^2) time also, so the whole algorithm takes O(k^2 \log N) time. This passes subtask 1!
As an example, suppose we want to compute the $101$st Fibonacci number: f_0 = 0, f_1 = 1 and f_i = f_{i1} + f_{i2}. Then:
 the denominator Q(x) is 1xx^2.
 Q(x)Q(x) is simply 13x^2+x^4. This means that f_i = 3f_{i2}  f_{i4}.
 Since N = 101 is odd, we will only consider the sequence (f_1, f_3, f_5, \ldots). Let g_i = f_{2i+1}. Then we have g_i = 3g_{i1}  g_{i2} with initial values g_0 = f_1 = 1 and g_1 = f_3 = 2.
 We now want to compute g_{\lfloor N/2 \rfloor} = g_{50}. We repeat this process until N becomes < 2.
Polynomials and linear recurrences
Unfortunately, the O(k^2) per step above seems really hard to optimize. The product Q(x)Q(x) can be computed quickly in O(k \log k) time using the fast Fourier transform (FFT) algorithm, but walking the recurrence k steps forward seems really hard to optimize (if you know of any way to do it quickly, please let me know!). Instead, we will look at a somewhat different solution (but the things we discussed above will still be used).
First, let’s try to factorize Q(x) = 1  C_1x  C_2x^2  \cdots  C_kx^k into linear terms, say Q(x) = (1r_1x)(1r_2x)\cdots(1r_kx). Suppose first, for simplicity, that all the r_i's are distinct, and none of them are zero. Then, we can decompose \frac{P(x)}{Q(x)} into partial fractions:
for some constants c_i. We don’t have to worry about how to compute them: we only need to know that this decomposition is possible.
Next, consider a single term \frac{c_t}{1r_tx}. Remember that:
therefore, we have
the $i$th term of this sequence is c_tr_t^i. Therefore, the $i$th term of the sequence generated by
which is just A_i, is equal to:
this is a nice form, but it doesn’t necessarily give a fast way to calculate A_N, because the roots are not necessarily rational. However, consider the polynomial M(x) defined by
Note that M(x) = x^kQ(1/x), so the roots of M(x) are precisely r_1, r_2, \ldots r_k.
Now, consider any polynomial that is a multiple of M(x), (so r_1, \ldots, r_k are also roots), say
Then since r_t is a root, we have
Multiplying by c_t, we have
Finally, summing the previous equation on t (from 1 to k), we get:
We have just shown the following:
Claim
Let m_lx^l + m_{l1}x^{l1} + m_{l2}x^{l2} + \cdots + m_0 be any polynomial divisible by M(x). Then:
This claim gives us a method to calculate A_N quickly, which will be described in the next section. But note that to show the claim, we have used the simplifying assumptions that the $r_i$s are distinct and nonzero. However, it is possible to prove the claim without those assumptions! The proof requires simple algebra and mathematical induction, which we will leave to the reader to prove. Therefore, the algorithm that will be described in the next section will actually work for any recurrence!
Fast solution: modular polynomial exponentiation
Consider the polynomial S(x) = x^N \bmod M(x). S(x) has degree less than k, since \operatorname{deg} M = k. Now let S(x) be
We know that x^N \equiv S(x) \pmod{M(x)}, in other words M(x) divides
By the above claim, therefore we have
or simply
Therefore, calculating A_N can be reduced to calculating the polynomial S(x) = x^N \bmod M(x). The latter can be done by using modular polynomial exponentiation! By raising the polynomial x to the $N$th power, and reducing intermediate polynomials modulo M(x) at every step, we ensure that the degree of the polynomials we are multiplying is always < k.
However, the method above only works quickly as long as we know how to multiply two polynomials quickly, and know how to take polynomials modulo M(x) quickly. Multiplication can be done in O(k \log k) time using FFT, but what about polynomial mod M(x)? Amazingly, it can also be done in O(k \log k) by reducing it into polynomial multiplication! This page describes one way it can be done.
Since each modular multiplication costs O(k \log k) time, the whole algorithm takes O(k \log k \log N) time if we use fast exponentiation!
Optimizations
When you implement this algorithm naïvely, you might notice that it takes too to process an input where k = 30000 (and for example N = 2^{59}1). Unfortunately, the constant hidden in the O notation seems to be too high! The time complexity is correct, so we will need additional tricks to cut down this constant and be able to squeeze the solution into the time limit. Here are a few optimizations:

When squaring a polynomial (such as in the above), you only need to compute the FFT once, not twice!

When computing A(x)B(x), revert to normal multiplication when \operatorname{deg}A or \operatorname{deg}B is small. Also, when computing A(x) \bmod B(x), revert to naive modulo when \operatorname{deg}B or \operatorname{deg}A\operatorname{deg}B is small. This is because algorithms with better time complexity usually have a larger constant hidden in the O notation than the slower ones (which is definitely true in our case), so it is beneficial to use the asymptotically slower ones for small inputs.

Although polynomial mod runs in O(k \log k) time, unfortunately the constant is huge, so it makes sense to optimize this part. If one reads the link given above, one discovers that to get A(x) \bmod B(x), one needs to get the multiplicative inverse of \operatorname{Rev}(B(x)) modulo x^L for some L. In fact, this computation is the slowest part of the algorithm.
But notice that B(x) is constant throughout our algorithm! (It’s always M(x)) Therefore, we could simply calculate the inverse once, before doing the fast exponentiation, to save a lot of time!

When multiplying with FFT, one usually needs to find a power of two that is larger than the degree of the product, i.e. 2^k > \operatorname{deg} A + \operatorname{deg} B. This means that if you’ve been calculating 2^k as 2^k > 2\max(\operatorname{deg} A, \operatorname{deg} B), then it’s time to change it! Although it’s a small thing, you might find that this still results in some substantial time savings.

Use the correct “fast exponentiation” technique! There are several ways to do fast exponenation, three of which are the following (assume that some of the variables represent polynomials):
Binary method
def pow1(b, e): ans = 1 i = 0 while 1 << i <= e: if e & (1 << i): ans \*= b b \*= b return ans
Squarefirst recursion
def pow2(b, e): if e == 0: return 1 if e is odd: return pow2(b, e  1) * b else: return pow2(b * b, e >> 1)
Squarelast recursion
def pow3(b, e): if e == 0: return 1 if e is odd: return pow2(b, e  1) * b else: result = pow2(b, e >> 1) return result * result
Notice the differences? They all require O(\log N) multiplications, but the squarelast recursion is more efficient for our purposes! Why? Because the base b doesn’t change! The worst case happens in all algorithms when N is a Mersenne number 2^k1, where it takes approximately 2k multiplications. However, for the squarelast recursion, half of those multiplications are simply multiplications by the base, which in our case is x. Multiplication by x is simple and requires only O(k) time. Afterwards, taking a degreek polynomial modulo M(x) can also be done in O(k) time. This effectively halves the number of FFT calls we need!
In addition to these, you may need to optimize your code in other ways if it still doesn’t pass the time limit.
Multiplication of polynomials modulo 104857601
Finally, we briefly describe how to compute the product of two polynomials modulo the prime p = 104857601. The usual FFT requires a complex $2^k$th roots of unity \omega, but in fact it is possible to perform the FFT in some finite fields \mathbb{Z}/p\mathbb{Z}! The idea is to replace \omega with a modular $2^k$th root of unity. The entire FFT algorithm still works!
However, the above only works when the field contains modular $2^k$th roots of unity. It is easily shown that \mathbb{Z}/p\mathbb{Z} contains $2^k$th roots of unity if and only if p is of the form 2^km+1 for some m. Thus, in our case, 104857601 = 2^{22}\cdot25+1, so there exists a primitive $2^{22}$th root of unity. 2^{22} is greater than four million, which is more than enough for our purposes!
Time Complexity:
O(k \log k \log N)