PROBLEM LINK:Author: Gaoyuan Chen DIFFICULTY:HARD PREREQUISITES:Generating functions, fast Fourier transform, number theoretic transform, polynomial mod, polynomial division PROBLEM:Let $(A_0, A_1, A_2, \ldots)$ be sequence defined by the recurrence: $$A_i = (C_1A_{i1} + C_2A_{i2} + C_3A_{i3} + \cdots + C_kA_{ik}) \bmod 104857601$$ $A_0, A_1, \ldots, A_{k1}$ are given, along with the coefficients $C_1, \ldots, C_k$. Given $N$, what is $A_N$? Note that in the problem, the sequence starts at $A_1$, not $A_0$, but the adjustment can be easily done and will greatly simplify this editorial. (To adjust it, simply shift $(A_1, \ldots, A_k)$ to $(A_0, \ldots, A_{k1})$, and decrement $N$ by $1$). QUICK EXPLANATION:Let $$M(x) = x^k  C_1x^{k1}  C_2x^{k2} + \cdots  C_{k1}x  C_k$$ Also, let $s_{k1}, s_{k2}, \ldots s_0$ be defined by the following congruence: $$x^N \equiv s_{k1}x^{k1} + s_{k2}x^{k2} + \cdots + s_1x + s_0 \pmod{104857601, M(x)}$$ Then we have the following amazing identity/congruence: $$A_N \equiv s_{k1}A_{k1} + s_{k2}A_{k2} + \cdots + s_1A_1 + s_0A_0 \pmod{104857601}$$ Computing $x^N \pmod{104857601, M(x)}$ can be done using fast exponentiation, fast polynomial multiplication and fast polynomial mod. Each multiplication and mod runs in $O(k \log k)$ time, and there are $\log N$ steps, so the algorithm runs in $O(k \log k \log N)$ time. Note that $\pmod{104857601, M(x)}$ means that the coefficients of polynomials are reduced modulo $104857601$, and the polynomials themselves are reduced modulo $M(x)$. Also, note that the implementation should be really efficient! We will discuss optimizations below. EXPLANATION:Let's start with the usual ways of solving linear recurrences. In the following, let's assume that $N \ge k$ (the case $N < k$ is trivial!). Furthermore, all integer arithmetic is assumed to be done modulo $104857601$, unless stated otherwise. Matrix exponentiation (too slow!)A popular way of getting the $N$th term of a linear recurrence quickly is to use matrix exponentiation. It works because of the following identity, which holds for all $i$ (we invite the reader to verify that this works): $$\left[\begin{array}{ccccc} C_1 & C_2 & C_3 & \cdots & C_k \\ 1 & 0 & 0 & \cdots & 0 \\ 0 & 1 & 0 & \cdots & 0 \\ 0 & 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & 0 \end{array}\right] \left[\begin{array}{c} A_{i1} \\ A_{i2} \\ A_{i3} \\ A_{i4} \\ \vdots \\ A_{ik} \end{array}\right] = \left[\begin{array}{c} A_i \\ A_{i1} \\ A_{i2} \\ A_{i3} \\ \vdots \\ A_{ik+1} \end{array}\right]$$ From this, it follows (also using associativity of matrix multiplication) that $$\left[\begin{array}{ccccc} C_1 & C_2 & C_3 & \cdots & C_k \\ 1 & 0 & 0 & \cdots & 0 \\ 0 & 1 & 0 & \cdots & 0 \\ 0 & 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & 0 \end{array}\right]^{Nk+1} \left[\begin{array}{c} A_{k1} \\ A_{k2} \\ A_{k3} \\ A_{k4} \\ \vdots \\ A_{0} \end{array}\right] = \left[\begin{array}{c} A_N \\ A_{N1} \\ A_{N2} \\ A_{N3} \\ \vdots \\ A_{Nk+1} \end{array}\right]$$ This means that we can get the answer, $A_N$, by simply raising a certain $k\times k$ matrix to the power $Nk+1$. One can use fast exponentiation for this! Unfortunately, this is slow to pass even the first subtask ($k \le 3000$). A naïve way of implementing this takes $O(k^3 \log N)$ time, because it takes $O(k^3)$ time to multiply two matrices, and there are $O(\log N)$ multiplications to be done. Even using the Strassen algorithm for multiplying matrices still takes too slow: $O(k^{2.81} \log N)$. Using more advanced matrix multiplication techniques is not worth it because the constant factors are large, and in any case there is a lower bound of $\Omega(k^2 \log k)$ for it, so it won't pass the last subtask anyhow. Clearly, we need to find a better approach. Generating functions (passes subtask 1)We can derive a much faster solution by means of generating functions. The generating function of any sequence (not just linear recurrences) $(s_0,s_1,s_2,s_3, \ldots)$ is the following (assuming we denote it by $S(x)$): $$S(x) = s_0 + s_1x + s_2x^2 + s_3x^3 + \ldots$$ Essentially, it is a power series whose coefficients are the given sequence. Another way of looking at it is that $S(x)$ generates the sequence $(s_0, s_1, \ldots)$. It is easy to show that (among many other nice properties):
Now, generating functions are useful in particular for linear recurrences. For example, for our sequence $(A_0, A_1, \ldots)$ above, let $A(x)$ be its generating function, i.e. $$A(x) = A_0 + A_1x + A_2x^2 + A_3x^3 + \ldots$$ Then consider the following: $$\begin{array}{rrrrrrrrrrrr} A(x) &=& A_0 &{}+& A_1x &{}+& A_2x^2 &{}+ \cdots &{}+& A_kx^k &{}+& A_{k+1}x^{k+1} &{}+ \cdots \\ C_1x A(x) &=& & & C_1A_0x &{}+& C_1A_1x^2 &{}+ \cdots &{}+& C_1A_{k1}x^k &{}+& C_1A_kx^{k+1} &{}+ \cdots \\ C_2x^2A(x) &=& & & & & C_2A_0x^2 &{}+ \cdots &{}+& C_2A_{k2}x^k &{}+& C_2A_{k1}x^{k+1} &{}+ \cdots \\ C_3x^3A(x) &=& & & & & & \cdots &{}+& C_3A_{k3}x^k &{}+& C_3A_{k2}x^{k+1} &{}+ \cdots \\ \ldots & & & & & & & & & \ldots & &\ldots & \\ C_kx^kA(x) &=& & & & & & & & C_kA_0x^k &{}+& C_kA_1x^{k+1} &{}+ \cdots \end{array}$$ We have aligned terms with the same $x^i$. Now, try subtracting from the first equation all the other equations. You'll see that the terms nicely cancel out, leaving only a few terms up to $x^{k1}$! In other words, we have the following equality: $$(1  C_1x  C_2x^2  C_3x^3  \cdots  C_kx^k)A(x) = P(x)$$ for some polynomial $P(x)$ of degree $< k$. If we let $$Q(x) = 1  C_1x  C_2x^2  C_3x^3  \cdots  C_kx^k$$ then we can rewrite the above equality as: $$A(x) = \frac{P(x)}{Q(x)}$$ Let's try to understand what this equality actually means. It says that the any order$k$ linear recurrence $(A_0, A_1, \ldots)$ has a rational generating function $\frac{P(x)}{Q(x)}$ such that:
By reversing the process, we see that any rational generating function $\frac{P(x)}{Q(x)}$ such that $\operatorname{deg} P < \operatorname{deg} Q$ and $Q(0) = 1$ generates an order$k$ linear recurrence! (where $k = \operatorname{deg} Q$) Now you might be wondering, what good will this do? It's not clear yet, but remembering that equal generating functions generate the same sequence, we have the following: The generating functions $\frac{P(x)}{Q(x)}$ and $\frac{P(x)R(x)}{Q(x)R(x)}$ generate the same sequence. By selecting $R(x)$ such that $R(0) = 1$, we find that the second generating function also generates a linear recurrence. Notice that they generate the same sequence, but the denominators are different! This means that we can derive a multitude of other recurrences satisfied by the original sequence $(A_0,A_1,A_2,\ldots)$ this way! As an example, consider the famous Fibonacci sequence, $f_0 = 0, f_1 = 1$ and $f_i = f_{i1} + f_{i2}$. One can easily show that its generating function $F(x)$ is equal to $\frac{x}{1xx^2}$. Now, let's say we multiply $\frac{1+3x+x^2}{1+3x+x^2}$ to it. Then we will get $$F(x) = \frac{x(1+3x+x^2)}{1+2x3x^24x^3x^4}$$ Since this is still the same function, we find that the Fibonacci numbers also satisfy the following recurrence: $$f_i = 2f_{i1} + 3f_{i2} + 4f_{i3} + f_{i4}$$ You can check that it indeed works! This gives us a bit of power to manipulate the coefficients of the recurrence, so let's try to change it in a way that's beneficial to us. One idea is this: $A_i$ can be expressed as an order$k$ linear recurrence. Now, is it possible to express $A_i$ as an order$2k$ linear recurrence, but using only even offsets, in other words, in terms only of $A_{i2}, A_{i4}, \ldots, A_{i2k}$? Because if we can do this, then we can essentially ignore half the terms of the original sequence, and we can repeat the process in $O(\log N)$ steps until $N$ becomes sufficiently small. In terms of the generating function $\frac{P(x)}{Q(x)}$, we want to find a degree$k$ polynomial $R(x)$ such that $Q(x)R(x)$ contains only the even terms. Does there exist such an $R(x)$? Amazingly, yes, and it is simply $Q(x)$! (those who know Graeffe's method will be familiar with this) This is easily shown by grouping the odd and even terms of $Q(x)$ and $Q(x)$, so one easily sees that $Q(x)Q(x)$ contains only even terms. Alternatively, one can simply show that $Q(x)Q(x)$ is an even function. With this new polynomial, we can essentially ignore half the sequence. The only remaining step is to compute the $k$ new initial values, which can be done by simply walking the recurrence $k$ times, and selecting the odd or evenoffset terms, depending on the parity of $N$. The walking can be done in $O(k^2)$ time, and $Q(x)Q(x)$ itself can be computed in $O(k^2)$ time also, so the whole algorithm takes $O(k^2 \log N)$ time. This passes subtask 1! As an example, suppose we want to compute the $101$st Fibonacci number: $f_0 = 0, f_1 = 1$ and $f_i = f_{i1} + f_{i2}$. Then:
Polynomials and linear recurrencesUnfortunately, the $O(k^2)$ per step above seems really hard to optimize. The product $Q(x)Q(x)$ can be computed quickly in $O(k \log k)$ time using the fast Fourier transform (FFT) algorithm, but walking the recurrence $k$ steps forward seems really hard to optimize (if you know of any way to do it quickly, please let me know!). Instead, we will look at a somewhat different solution (but the things we discussed above will still be used). First, let's try to factorize $Q(x) = 1  C_1x  C_2x^2  \cdots  C_kx^k$ into linear terms, say $Q(x) = (1r_1x)(1r_2x)\cdots(1r_kx)$. Suppose first, for simplicity, that all the $r_i$'s are distinct, and none of them are zero. Then, we can decompose $\frac{P(x)}{Q(x)}$ into partial fractions: $$\frac{P(x)}{Q(x)} = \frac{P(x)}{(1r_1x)(1r_2x)\cdots(1r_kx)} = \frac{c_1}{1r_1x}+\frac{c_2}{1r_2x}+\cdots+\frac{c_k}{1r_kx}$$ for some constants $c_i$. We don't have to worry about how to compute them: we only need to know that this decomposition is possible. Next, consider a single term $\frac{c_t}{1r_tx}$. Remember that: $$\frac{1}{1y} = 1+y+y^2+y^3+\cdots$$ therefore, we have $$\frac{c_t}{1r_tx} = c_t + c_tr_tx + c_tr_t^2x^2 + c_tr_t^3x^3 + \cdots$$ the $i$th term of this sequence is $c_tr_t^i$. Therefore, the $i$th term of the sequence generated by $$\frac{P(x)}{Q(x)} = \frac{c_1}{1r_1x}+\frac{c_2}{1r_2x}+\cdots+\frac{c_k}{1r_kx}$$ which is just $A_i$, is equal to: $$A_i = c_1r_1^i + c_2r_2^i + c_3r_3^i + \cdots + c_kr_k^i$$ this is a nice form, but it doesn't necessarily give a fast way to calculate $A_N$, because the roots are not necessarily rational. However, consider the polynomial $M(x)$ defined by $$M(x) = x^k  C_1x^{k1}  C_2x^{k2}  \cdots  C_k$$ Note that $M(x) = x^kQ(1/x)$, so the roots of $M(x)$ are precisely $r_1, r_2, \ldots r_k$. Now, consider any polynomial that is a multiple of $M(x)$, (so $r_1, \ldots, r_k$ are also roots), say $$M'(x) = m_lx^l + m_{l1}x^{l1} + m_{l2}x^{l2} + \cdots + m_0$$ Then since $r_t$ is a root, we have $$0 = m_lr_t^l + m_{l1}r_t^{l1} + m_{l2}r_t^{l2} + \cdots + m_0$$ Multiplying by $c_t$, we have $$0 = m_lc_tr_t^l + m_{l1}c_tr_t^{l1} + m_{l2}c_tr_t^{l2} + \cdots + m_0c_t$$ Finally, summing the previous equation on $t$ (from $1$ to $k$), we get: $$0 = m_lA_l + m_{l1}A_{l1} + m_{l2}A_{l2} + \cdots + m_0A_0$$ We have just shown the following: Claim Let $m_lx^l + m_{l1}x^{l1} + m_{l2}x^{l2} + \cdots + m_0$ be any polynomial divisible by $M(x)$. Then: $$m_lA_l + m_{l1}A_{l1} + m_{l2}A_{l2} + \cdots + m_0 = 0$$ This claim gives us a method to calculate $A_N$ quickly, which will be described in the next section. But note that to show the claim, we have used the simplifying assumptions that the $r_i$s are distinct and nonzero. However, it is possible to prove the claim without those assumptions! The proof requires simple algebra and mathematical induction, which we will leave to the reader to prove. Therefore, the algorithm that will be described in the next section will actually work for any recurrence! Fast solution: modular polynomial exponentiationConsider the polynomial $S(x) = x^N \bmod M(x)$. $S(x)$ has degree less than $k$, since $\operatorname{deg} M = k$. Now let $S(x)$ be $$S(x) = s_{k1}x^{k1} + s_{k2}x^{k2} + \cdots + s_1x + s_0$$ We know that $x^N \equiv S(x) \pmod{M(x)}$, in other words $M(x)$ divides $$x^N  S(x) = x^N  s_{k1}x^{k1}  s_{k2}x^{k2}  \cdots  s_1x  s_0$$ By the above claim, therefore we have $$A_N  s_{k1}A_{k1}  s_{k2}A_{k2}  \cdots  s_1A_1  s_0A_0 = 0$$ or simply $$A_N = s_{k1}A_{k1} + s_{k2}A_{k2} + \cdots + s_1A_1 + s_0A_0$$ Therefore, calculating $A_N$ can be reduced to calculating the polynomial $S(x) = x^N \bmod M(x)$. The latter can be done by using modular polynomial exponentiation! By raising the polynomial $x$ to the $N$th power, and reducing intermediate polynomials modulo $M(x)$ at every step, we ensure that the degree of the polynomials we are multiplying is always $< k$. However, the method above only works quickly as long as we know how to multiply two polynomials quickly, and know how to take polynomials modulo $M(x)$ quickly. Multiplication can be done in $O(k \log k)$ time using FFT, but what about polynomial mod $M(x)$? Amazingly, it can also be done in $O(k \log k)$ by reducing it into polynomial multiplication! This page describes one way it can be done. Since each modular multiplication costs $O(k \log k)$ time, the whole algorithm takes $O(k \log k \log N)$ time if we use fast exponentiation! OptimizationsWhen you implement this algorithm naïvely, you might notice that it takes too to process an input where $k = 30000$ (and for example $N = 2^{59}1$). Unfortunately, the constant hidden in the $O$ notation seems to be too high! The time complexity is correct, so we will need additional tricks to cut down this constant and be able to squeeze the solution into the time limit. Here are a few optimizations:
Binary method
Squarefirst recursion
Squarelast recursion
Notice the differences? They all require $O(\log N)$ multiplications, but the squarelast recursion is more efficient for our purposes! Why? Because the base $b$ doesn't change! The worst case happens in all algorithms when $N$ is a Mersenne number $2^k1$, where it takes approximately $2k$ multiplications. However, for the squarelast recursion, half of those multiplications are simply multiplications by the base, which in our case is $x$. Multiplication by $x$ is simple and requires only $O(k)$ time. Afterwards, taking a degree$k$ polynomial modulo $M(x)$ can also be done in $O(k)$ time. This effectively halves the number of FFT calls we need! In addition to these, you may need to optimize your code in other ways if it still doesn't pass the time limit. Multiplication of polynomials modulo $104857601$Finally, we briefly describe how to compute the product of two polynomials modulo the prime $p = 104857601$. The usual FFT requires a complex $2^k$th roots of unity $\omega$, but in fact it is possible to perform the FFT in some finite fields $\mathbb{Z}/p\mathbb{Z}$! The idea is to replace $\omega$ with a modular $2^k$th root of unity. The entire FFT algorithm still works! However, the above only works when the field contains modular $2^k$th roots of unity. It is easily shown that $\mathbb{Z}/p\mathbb{Z}$ contains $2^k$th roots of unity if and only if $p$ is of the form $2^km+1$ for some $m$. Thus, in our case, $104857601 = 2^{22}\cdot25+1$, so there exists a primitive $2^{22}$th root of unity. $2^{22}$ is greater than four million, which is more than enough for our purposes! Time Complexity:$O(k \log k \log N)$ AUTHOR'S AND TESTER'S SOLUTIONS:
This question is marked "community wiki".
asked 15 Mar '15, 02:17

Great editorial. It seems like the generating function operations can also be applied to define a general recurrence relation for two distinct sequences, to distinguish between them we would start with different initial values. I've read and reread the generating function part again and again and even looked at different articles online but, the part about multiplying the denominator by another generating function and being able to reproduce the same sequence but with a different recurrence relation still does not seem intuitive for me. If anyone has a source that explains more about this or approaches it from a different angle I would highly appreciate it. answered 19 Oct '16, 17:50

I've solved the problem up to computing the inverse of M(x) and I haven't fully understood it yet. What's the value of L here: 'one needs to get the multiplicative inverse of Rev(B(x)) modulo x^L for some L'? How exactly do you compute the inverse (maybe it'll be readable in Setter's solution...)? answered 16 Mar '15, 16:50

Thanks for the great editorial! I wonder that the idea in "Generating functions (passes subtask 1)" gives a solution with O(k log k log N) time.
Our goal is to compute the coefficient of x^N of P(x)/Q(x), which we denote by P(x)/Q(x) [N]. As written in the editorial, P(x)/Q(x) = P(x)Q(x) / (Q(x)Q(x)). Since we are only interested in the coefficient of x^N, we only have to consider half of the terms of P(x)Q(x). For instance, if N is even, all odd degree terms of P(x)Q(x) do not matter since the denominator Q(x)Q(x) is an even polynomial. Hence, P(x)/Q(x) [N] = E(P(x)Q(x)) / (Q(x)Q(x)) [N] = P'(x)/Q'(x) [N/2] where E(R(x)) denotes the polynomial consists only of even degree terms of R(x), P'(x^2) := E(P(x)Q(x)) and where Q'(x^2) := Q(x)Q(x). If N is odd, we obtain a similar recursion, P(x)/Q(x) [N] = O(P(x)Q(x)) / (Q(x)Q(x)) [N] = P'(x)/Q'(x) [(N1)/2] where O(R(x)) denotes the polynomial consists only of odd degree terms of R(x), P'(x^2) := O(P(x)Q(x))/x and where Q'(x^2) := Q(x)Q(x). Here, the degrees of P' and Q' are at most k1 and k, respectively. Then, we obtain the recursion in which the complexity of each step is O(k log k) by FFT, and the number of steps is O(log N). Hence, we obtain O(k log k log N) time algorithm. answered 01 Aug '18, 13:29
