RNG - Editorial

kevinsogo · March 15, 2015, 2:17am

PROBLEM LINK:

Practice
Contest

Author: Gaoyuan Chen
Tester: Hiroto Sekido
Editorialist: Kevin Atienza

DIFFICULTY:

HARD

PREREQUISITES:

Generating functions, fast Fourier transform, number theoretic transform, polynomial mod, polynomial division

PROBLEM:

Let (A_0, A_1, A_2, \ldots) be sequence defined by the recurrence:

A_i = (C_1A_{i-1} + C_2A_{i-2} + C_3A_{i-3} + \cdots + C_kA_{i-k}) \bmod 104857601

A_0, A_1, \ldots, A_{k-1} are given, along with the coefficients C_1, \ldots, C_k. Given N, what is A_N?

Note that in the problem, the sequence starts at A_1, not A_0, but the adjustment can be easily done and will greatly simplify this editorial. (To adjust it, simply shift (A_1, \ldots, A_k) to (A_0, \ldots, A_{k-1}), and decrement N by 1).

QUICK EXPLANATION:

Let

M(x) = x^k - C_1x^{k-1} - C_2x^{k-2} + \cdots - C_{k-1}x - C_k

Also, let s_{k-1}, s_{k-2}, \ldots s_0 be defined by the following congruence:

x^N \equiv s_{k-1}x^{k-1} + s_{k-2}x^{k-2} + \cdots + s_1x + s_0 \pmod{104857601, M(x)}

Then we have the following amazing identity/congruence:

A_N \equiv s_{k-1}A_{k-1} + s_{k-2}A_{k-2} + \cdots + s_1A_1 + s_0A_0 \pmod{104857601}

Computing x^N \pmod{104857601, M(x)} can be done using fast exponentiation, fast polynomial multiplication and fast polynomial mod. Each multiplication and mod runs in O(k \log k) time, and there are \log N steps, so the algorithm runs in O(k \log k \log N) time.

Note that \pmod{104857601, M(x)} means that the coefficients of polynomials are reduced modulo 104857601, and the polynomials themselves are reduced modulo M(x). Also, note that the implementation should be really efficient! We will discuss optimizations below.

EXPLANATION:

Let’s start with the usual ways of solving linear recurrences. In the following, let’s assume that N \ge k (the case N < k is trivial!). Furthermore, all integer arithmetic is assumed to be done modulo 104857601, unless stated otherwise.

Matrix exponentiation (too slow!)

A popular way of getting the N th term of a linear recurrence quickly is to use matrix exponentiation. It works because of the following identity, which holds for all i (we invite the reader to verify that this works):

\left[\begin{array}{ccccc} C_1 & C_2 & C_3 & \cdots & C_k \\ 1 & 0 & 0 & \cdots & 0 \\ 0 & 1 & 0 & \cdots & 0 \\ 0 & 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & 0 \end{array}\right] \left[\begin{array}{c} A_{i-1} \\ A_{i-2} \\ A_{i-3} \\ A_{i-4} \\ \vdots \\ A_{i-k} \end{array}\right] = \left[\begin{array}{c} A_i \\ A_{i-1} \\ A_{i-2} \\ A_{i-3} \\ \vdots \\ A_{i-k+1} \end{array}\right]

From this, it follows (also using associativity of matrix multiplication) that

\left[\begin{array}{ccccc} C_1 & C_2 & C_3 & \cdots & C_k \\ 1 & 0 & 0 & \cdots & 0 \\ 0 & 1 & 0 & \cdots & 0 \\ 0 & 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & 0 \end{array}\right]^{N-k+1} \left[\begin{array}{c} A_{k-1} \\ A_{k-2} \\ A_{k-3} \\ A_{k-4} \\ \vdots \\ A_{0} \end{array}\right] = \left[\begin{array}{c} A_N \\ A_{N-1} \\ A_{N-2} \\ A_{N-3} \\ \vdots \\ A_{N-k+1} \end{array}\right]

This means that we can get the answer, A_N, by simply raising a certain k\times k matrix to the power N-k+1. One can use fast exponentiation for this!

Unfortunately, this is slow to pass even the first subtask (k \le 3000). A naïve way of implementing this takes O(k^3 \log N) time, because it takes O(k^3) time to multiply two matrices, and there are O(\log N) multiplications to be done. Even using the Strassen algorithm for multiplying matrices still takes too slow: O(k^{2.81} \log N). Using more advanced matrix multiplication techniques is not worth it because the constant factors are large, and in any case there is a lower bound of \Omega(k^2 \log k) for it, so it won’t pass the last subtask anyhow. Clearly, we need to find a better approach.

Generating functions (passes subtask 1)

We can derive a much faster solution by means of generating functions. The generating function of any sequence (not just linear recurrences) (s_0,s_1,s_2,s_3, \ldots) is the following (assuming we denote it by S(x)):

S(x) = s_0 + s_1x + s_2x^2 + s_3x^3 + \ldots

Essentially, it is a power series whose coefficients are the given sequence. Another way of looking at it is that S(x) generates the sequence (s_0, s_1, \ldots). It is easy to show that (among many other nice properties):

two generating functions are equal if and only if the sequences they generate are identical.
the sum of two generating functions generates the sum of the two sequences they generate. In other words, if A(x) generates (a_i) and B(x) generates (b_i), then A(x)+B(x) generates (a_i+b_i).

Now, generating functions are useful in particular for linear recurrences. For example, for our sequence (A_0, A_1, \ldots) above, let A(x) be its generating function, i.e.

A(x) = A_0 + A_1x + A_2x^2 + A_3x^3 + \ldots

Then consider the following:

\begin{array}{rrrrrrrrrrrr} A(x) &=& A_0 &{}+& A_1x &{}+& A_2x^2 &{}+ \cdots &{}+& A_kx^k &{}+& A_{k+1}x^{k+1} &{}+ \cdots \\ C_1x A(x) &=& & & C_1A_0x &{}+& C_1A_1x^2 &{}+ \cdots &{}+& C_1A_{k-1}x^k &{}+& C_1A_kx^{k+1} &{}+ \cdots \\ C_2x^2A(x) &=& & & & & C_2A_0x^2 &{}+ \cdots &{}+& C_2A_{k-2}x^k &{}+& C_2A_{k-1}x^{k+1} &{}+ \cdots \\ C_3x^3A(x) &=& & & & & & \cdots &{}+& C_3A_{k-3}x^k &{}+& C_3A_{k-2}x^{k+1} &{}+ \cdots \\ \ldots & & & & & & & & & \ldots & &\ldots & \\ C_kx^kA(x) &=& & & & & & & & C_kA_0x^k &{}+& C_kA_1x^{k+1} &{}+ \cdots \end{array}

We have aligned terms with the same x^i. Now, try subtracting from the first equation all the other equations. You’ll see that the terms nicely cancel out, leaving only a few terms up to x^{k-1}! In other words, we have the following equality:

(1 - C_1x - C_2x^2 - C_3x^3 - \cdots - C_kx^k)A(x) = P(x)

for some polynomial P(x) of degree < k. If we let

Q(x) = 1 - C_1x - C_2x^2 - C_3x^3 - \cdots - C_kx^k

then we can rewrite the above equality as:

A(x) = \frac{P(x)}{Q(x)}

Let’s try to understand what this equality actually means. It says that the any order-k linear recurrence (A_0, A_1, \ldots) has a rational generating function \frac{P(x)}{Q(x)} such that:

Q(x) is of degree k, and contains the coefficients of the recurrence. Additionally, the constant term is 1 (or Q(0) = 1).
P(x) is of degree < k, and along with Q(x) encodes the initial values of the recurrence (albeit in a nonintuitive format).

By reversing the process, we see that any rational generating function \frac{P(x)}{Q(x)} such that \operatorname{deg} P < \operatorname{deg} Q and Q(0) = 1 generates an order-k linear recurrence! (where k = \operatorname{deg} Q)

Now you might be wondering, what good will this do? It’s not clear yet, but remembering that equal generating functions generate the same sequence, we have the following:

The generating functions \frac{P(x)}{Q(x)} and \frac{P(x)R(x)}{Q(x)R(x)} generate the same sequence.

By selecting R(x) such that R(0) = 1, we find that the second generating function also generates a linear recurrence. Notice that they generate the same sequence, but the denominators are different! This means that we can derive a multitude of other recurrences satisfied by the original sequence (A_0,A_1,A_2,\ldots) this way!

As an example, consider the famous Fibonacci sequence, f_0 = 0, f_1 = 1 and f_i = f_{i-1} + f_{i-2}. One can easily show that its generating function F(x) is equal to \frac{x}{1-x-x^2}. Now, let’s say we multiply \frac{1+3x+x^2}{1+3x+x^2} to it. Then we will get

F(x) = \frac{x(1+3x+x^2)}{1+2x-3x^2-4x^3-x^4}

Since this is still the same function, we find that the Fibonacci numbers also satisfy the following recurrence:

f_i = -2f_{i-1} + 3f_{i-2} + 4f_{i-3} + f_{i-4}

You can check that it indeed works!

This gives us a bit of power to manipulate the coefficients of the recurrence, so let’s try to change it in a way that’s beneficial to us. One idea is this: A_i can be expressed as an order-k linear recurrence. Now, is it possible to express A_i as an order-2k linear recurrence, but using only even offsets, in other words, in terms only of A_{i-2}, A_{i-4}, \ldots, A_{i-2k}? Because if we can do this, then we can essentially ignore half the terms of the original sequence, and we can repeat the process in O(\log N) steps until N becomes sufficiently small.

In terms of the generating function \frac{P(x)}{Q(x)}, we want to find a degree-k polynomial R(x) such that Q(x)R(x) contains only the even terms. Does there exist such an R(x)? Amazingly, yes, and it is simply Q(-x)! (those who know Graeffe’s method will be familiar with this) This is easily shown by grouping the odd and even terms of Q(x) and Q(-x), so one easily sees that Q(x)Q(-x) contains only even terms. Alternatively, one can simply show that Q(x)Q(-x) is an even function.

With this new polynomial, we can essentially ignore half the sequence. The only remaining step is to compute the k new initial values, which can be done by simply walking the recurrence k times, and selecting the odd- or even-offset terms, depending on the parity of N. The walking can be done in O(k^2) time, and Q(x)Q(-x) itself can be computed in O(k^2) time also, so the whole algorithm takes O(k^2 \log N) time. This passes subtask 1!

As an example, suppose we want to compute the $101$st Fibonacci number: f_0 = 0, f_1 = 1 and f_i = f_{i-1} + f_{i-2}. Then:

the denominator Q(x) is 1-x-x^2.
Q(x)Q(-x) is simply 1-3x^2+x^4. This means that f_i = 3f_{i-2} - f_{i-4}.
Since N = 101 is odd, we will only consider the sequence (f_1, f_3, f_5, \ldots). Let g_i = f_{2i+1}. Then we have g_i = 3g_{i-1} - g_{i-2} with initial values g_0 = f_1 = 1 and g_1 = f_3 = 2.
We now want to compute g_{\lfloor N/2 \rfloor} = g_{50}. We repeat this process until N becomes < 2.

Polynomials and linear recurrences

Unfortunately, the O(k^2) per step above seems really hard to optimize. The product Q(x)Q(-x) can be computed quickly in O(k \log k) time using the fast Fourier transform (FFT) algorithm, but walking the recurrence k steps forward seems really hard to optimize (if you know of any way to do it quickly, please let me know!). Instead, we will look at a somewhat different solution (but the things we discussed above will still be used).

First, let’s try to factorize Q(x) = 1 - C_1x - C_2x^2 - \cdots - C_kx^k into linear terms, say Q(x) = (1-r_1x)(1-r_2x)\cdots(1-r_kx). Suppose first, for simplicity, that all the r_i's are distinct, and none of them are zero. Then, we can decompose \frac{P(x)}{Q(x)} into partial fractions:

\frac{P(x)}{Q(x)} = \frac{P(x)}{(1-r_1x)(1-r_2x)\cdots(1-r_kx)} = \frac{c_1}{1-r_1x}+\frac{c_2}{1-r_2x}+\cdots+\frac{c_k}{1-r_kx}

for some constants c_i. We don’t have to worry about how to compute them: we only need to know that this decomposition is possible.

Next, consider a single term \frac{c_t}{1-r_tx}. Remember that:

\frac{1}{1-y} = 1+y+y^2+y^3+\cdots

therefore, we have

\frac{c_t}{1-r_tx} = c_t + c_tr_tx + c_tr_t^2x^2 + c_tr_t^3x^3 + \cdots

the i th term of this sequence is c_tr_t^i. Therefore, the i th term of the sequence generated by

\frac{P(x)}{Q(x)} = \frac{c_1}{1-r_1x}+\frac{c_2}{1-r_2x}+\cdots+\frac{c_k}{1-r_kx}

which is just A_i, is equal to:

A_i = c_1r_1^i + c_2r_2^i + c_3r_3^i + \cdots + c_kr_k^i

this is a nice form, but it doesn’t necessarily give a fast way to calculate A_N, because the roots are not necessarily rational. However, consider the polynomial M(x) defined by

M(x) = x^k - C_1x^{k-1} - C_2x^{k-2} - \cdots - C_k

Note that M(x) = x^kQ(1/x), so the roots of M(x) are precisely r_1, r_2, \ldots r_k.

Now, consider any polynomial that is a multiple of M(x), (so r_1, \ldots, r_k are also roots), say

M'(x) = m_lx^l + m_{l-1}x^{l-1} + m_{l-2}x^{l-2} + \cdots + m_0

Then since r_t is a root, we have

0 = m_lr_t^l + m_{l-1}r_t^{l-1} + m_{l-2}r_t^{l-2} + \cdots + m_0

Multiplying by c_t, we have

0 = m_lc_tr_t^l + m_{l-1}c_tr_t^{l-1} + m_{l-2}c_tr_t^{l-2} + \cdots + m_0c_t

Finally, summing the previous equation on t (from 1 to k), we get:

0 = m_lA_l + m_{l-1}A_{l-1} + m_{l-2}A_{l-2} + \cdots + m_0A_0

We have just shown the following:

Claim

Let m_lx^l + m_{l-1}x^{l-1} + m_{l-2}x^{l-2} + \cdots + m_0 be any polynomial divisible by M(x). Then:

m_lA_l + m_{l-1}A_{l-1} + m_{l-2}A_{l-2} + \cdots + m_0 = 0

This claim gives us a method to calculate A_N quickly, which will be described in the next section. But note that to show the claim, we have used the simplifying assumptions that the $r_i$s are distinct and nonzero. However, it is possible to prove the claim without those assumptions! The proof requires simple algebra and mathematical induction, which we will leave to the reader to prove. Therefore, the algorithm that will be described in the next section will actually work for any recurrence!

Fast solution: modular polynomial exponentiation

Consider the polynomial S(x) = x^N \bmod M(x). S(x) has degree less than k, since \operatorname{deg} M = k. Now let S(x) be

S(x) = s_{k-1}x^{k-1} + s_{k-2}x^{k-2} + \cdots + s_1x + s_0

We know that x^N \equiv S(x) \pmod{M(x)}, in other words M(x) divides

x^N - S(x) = x^N - s_{k-1}x^{k-1} - s_{k-2}x^{k-2} - \cdots - s_1x - s_0

By the above claim, therefore we have

A_N - s_{k-1}A_{k-1} - s_{k-2}A_{k-2} - \cdots - s_1A_1 - s_0A_0 = 0

or simply

A_N = s_{k-1}A_{k-1} + s_{k-2}A_{k-2} + \cdots + s_1A_1 + s_0A_0

Therefore, calculating A_N can be reduced to calculating the polynomial S(x) = x^N \bmod M(x). The latter can be done by using modular polynomial exponentiation! By raising the polynomial x to the N th power, and reducing intermediate polynomials modulo M(x) at every step, we ensure that the degree of the polynomials we are multiplying is always < k.

However, the method above only works quickly as long as we know how to multiply two polynomials quickly, and know how to take polynomials modulo M(x) quickly. Multiplication can be done in O(k \log k) time using FFT, but what about polynomial mod M(x)? Amazingly, it can also be done in O(k \log k) by reducing it into polynomial multiplication! This page describes one way it can be done.

Since each modular multiplication costs O(k \log k) time, the whole algorithm takes O(k \log k \log N) time if we use fast exponentiation!

Optimizations

When you implement this algorithm naïvely, you might notice that it takes too to process an input where k = 30000 (and for example N = 2^{59}-1). Unfortunately, the constant hidden in the O notation seems to be too high! The time complexity is correct, so we will need additional tricks to cut down this constant and be able to squeeze the solution into the time limit. Here are a few optimizations:

When squaring a polynomial (such as in the above), you only need to compute the FFT once, not twice!
When computing A(x)B(x), revert to normal multiplication when \operatorname{deg}A or \operatorname{deg}B is small. Also, when computing A(x) \bmod B(x), revert to naive modulo when \operatorname{deg}B or \operatorname{deg}A-\operatorname{deg}B is small. This is because algorithms with better time complexity usually have a larger constant hidden in the O notation than the slower ones (which is definitely true in our case), so it is beneficial to use the asymptotically slower ones for small inputs.
Although polynomial mod runs in O(k \log k) time, unfortunately the constant is huge, so it makes sense to optimize this part. If one reads the link given above, one discovers that to get A(x) \bmod B(x), one needs to get the multiplicative inverse of \operatorname{Rev}(B(x)) modulo x^L for some L. In fact, this computation is the slowest part of the algorithm.

But notice that B(x) is constant throughout our algorithm! (It’s always M(x)) Therefore, we could simply calculate the inverse once, before doing the fast exponentiation, to save a lot of time!
When multiplying with FFT, one usually needs to find a power of two that is larger than the degree of the product, i.e. 2^k > \operatorname{deg} A + \operatorname{deg} B. This means that if you’ve been calculating 2^k as 2^k > 2\max(\operatorname{deg} A, \operatorname{deg} B), then it’s time to change it! Although it’s a small thing, you might find that this still results in some substantial time savings.
Use the correct “fast exponentiation” technique! There are several ways to do fast exponenation, three of which are the following (assume that some of the variables represent polynomials):

Binary method


  def pow1(b, e):
      ans = 1
      i = 0
      while 1 << i <= e:
          if e & (1 << i):
              ans \*= b
          b \*= b
      return ans

Square-first recursion


  def pow2(b, e):
      if e == 0:
          return 1
      if e is odd:
          return pow2(b, e - 1) * b
      else:
          return pow2(b * b, e >> 1)

Square-last recursion


  def pow3(b, e):
      if e == 0:
          return 1
      if e is odd:
          return pow2(b, e - 1) * b
      else:
          result = pow2(b, e >> 1)
          return result * result

Notice the differences? They all require O(\log N) multiplications, but the square-last recursion is more efficient for our purposes! Why? Because the base b doesn’t change! The worst case happens in all algorithms when N is a Mersenne number 2^k-1, where it takes approximately 2k multiplications. However, for the square-last recursion, half of those multiplications are simply multiplications by the base, which in our case is x. Multiplication by x is simple and requires only O(k) time. Afterwards, taking a degree-k polynomial modulo M(x) can also be done in O(k) time. This effectively halves the number of FFT calls we need!

In addition to these, you may need to optimize your code in other ways if it still doesn’t pass the time limit.

Multiplication of polynomials modulo 104857601

Finally, we briefly describe how to compute the product of two polynomials modulo the prime p = 104857601. The usual FFT requires a complex $2^k$th roots of unity \omega, but in fact it is possible to perform the FFT in some finite fields \mathbb{Z}/p\mathbb{Z}! The idea is to replace \omega with a modular $2^k$th root of unity. The entire FFT algorithm still works!

However, the above only works when the field contains modular $2^k$th roots of unity. It is easily shown that \mathbb{Z}/p\mathbb{Z} contains $2^k$th roots of unity if and only if p is of the form 2^km+1 for some m. Thus, in our case, 104857601 = 2^{22}\cdot25+1, so there exists a primitive $2^{22}$th root of unity. 2^{22} is greater than four million, which is more than enough for our purposes!

Time Complexity:

O(k \log k \log N)

AUTHOR’S AND TESTER’S SOLUTIONS:

Setter
Tester

daiver19 · March 16, 2015, 4:50pm

I’ve solved the problem up to computing the inverse of M(x) and I haven’t fully understood it yet.
What’s the value of L here: ‘one needs to get the multiplicative inverse of Rev(B(x)) modulo x^L for some L’? How exactly do you compute the inverse (maybe it’ll be readable in Setter’s solution…)?

argos · March 16, 2015, 8:55pm

I’ve another solution (it was said in editorial that it is impossible): do everything like in matrix multiplication, but store only first row. It is still linear transformation so we need to calculate only k^2 coefficients on every step, but its nonlinear from part could be calculated only once and all other could be done in several fft. My solution has complexity O(k^2+klog k log n)

junior94 · October 19, 2016, 5:50pm

Great editorial.

It seems like the generating function operations can also be applied to define a general recurrence relation for two distinct sequences, to distinguish between them we would start with different initial values.

I’ve read and re-read the generating function part again and again and even looked at different articles online but, the part about multiplying the denominator by another generating function and being able to reproduce the same sequence but with a different recurrence relation still does not seem intuitive for me. If anyone has a source that explains more about this or approaches it from a different angle I would highly appreciate it.

ryuhei · August 1, 2018, 1:29pm

Thanks for the great editorial!
I wonder that the idea in “Generating functions (passes subtask 1)” gives a solution with O(k log k log N) time.

but walking the recurrence k steps forward seems really hard to optimize (if you know of any way to do it quickly, please let me know!).

Our goal is to compute the coefficient of x^N of P(x)/Q(x), which we denote by P(x)/Q(x) [N].
As written in the editorial, P(x)/Q(x) = P(x)Q(-x) / (Q(x)Q(-x)).
Since we are only interested in the coefficient of x^N, we only have to consider half of the terms of P(x)Q(-x).
For instance, if N is even, all odd degree terms of P(x)Q(-x) do not matter since the denominator Q(x)Q(-x) is an even polynomial.
Hence, P(x)/Q(x) [N] = E(P(x)Q(-x)) / (Q(x)Q(-x)) [N] = P’(x)/Q’(x) [N/2]
where E(R(x)) denotes the polynomial consists only of even degree terms of R(x), P’(x^2) := E(P(x)Q(-x)) and where Q’(x^2) := Q(x)Q(-x).
If N is odd, we obtain a similar recursion, P(x)/Q(x) [N] = O(P(x)Q(x)) / (Q(x)Q(-x)) [N] = P’(x)/Q’(x) [(N-1)/2] where O(R(x)) denotes the polynomial consists only of odd degree terms of R(x), P’(x^2) := O(P(x)Q(-x))/x and where Q’(x^2) := Q(x)Q(-x).
Here, the degrees of P’ and Q’ are at most k-1 and k, respectively.
Then, we obtain the recursion in which the complexity of each step is O(k log k) by FFT, and the number of steps is O(log N). Hence, we obtain O(k log k log N) time algorithm.

kevinsogo · May 9, 2015, 6:44pm

By the way, when I said that it’s “impossible”, I only referred to techniques that rely on general matrix exponentiation, i.e. without any special assumption on the matrix.

But good job on your solution!