PROBLEM LINK:
Practice
Div2 Contest
Div1 Contest
Author & Editorialist: Cisco Ortega
Tester : Danya Smelskiy
DIFFICULTY:
Hard
PREREQUISITES:
Math, Generating Functions, FFT
PROBLEM:
Consider an m\times n grid where (1,1) is the topleft corner and (m,n) is the bottomright corner. From some cell, we can travel to the cell directly below it or the cell directly to its right. We are given an array of n distinct integers a_1, a_2, \dots a_n, and all cells in column j have the value a_j. The weight of a path is the product of all cells on a given path.
For Q different values of k_i, find the sum of the weights of all paths that begin at (1,1) and end at (k_i,n), taken modulo 7\times17\times2^{23}+1.
QUICK EXPLANATION:

Use generating functions to find that the sum of the weights of all paths from (1,1) to (m,n) is equal to the coefficient of x^{m+n1} in \dfrac{x^n \prod_j a_j}{\prod_j (1a_j x)}.This is equivalent to finding the coefficient of x^{m1} in \dfrac{(1)^n}{\prod_j (xa_j^{1})}.

If Q \leq 10,
 Notice that the answer for some k_i is independent of m. So, if we have a method to compute the answer for some m, we can use it for any query.
 Use partial fraction decomposition to express this as (1)^n\sum_j \dfrac{K_j}{xa_j^{1}}, where K_j is the respective coefficient of the $j$th term. We can do this in \mathcal{O}(n \log^2 n)
 We reexpress this as (1)^{n+1}\sum_j \dfrac{K_j a_j}{1a_j x}. With geometric seriesâ expansion, the coefficient of x^{m1} is thus (1)^{n+1}\sum_j K_ja_j^m. We can evaluate this sum in \mathcal{O}(n \log m).
 The final complexity over all queries is \mathcal{O}(n \log^2 n+Qn\log m).
 If m \leq 10^5,
 We can evaluate \prod_j (xa_j^{1}) in \mathcal{O}(n \log^2 n)
 We can find the first m terms of its reciprocal in \mathcal{O}(m \log m)
 We can answer each query in \mathcal{O}(1) since getting the first m terms gives us the answers for all 1 \leq k \leq m.
 The final complexity over all queries is \mathcal{O}(n\log^2 n+m\log m+Q)
EXPLANATION:
The following tutorial can be considered a crash course on several different techniques that are used in computationally manipulating polynomials. It will be a real marathon. We will assume, as prerequisite, that the reader has knowledge of the following techniques,
 Computing the modular inverse a^{1} \bmod p for some prime p
 Multiplying two n degree polynomials in \mathcal{O}(n \log n) using the Fast Fourier Transform.
 Performing a Number Theoretic Transform by using a primitive root of unity of a prime modulus instead of a complex root of unity.
and the full list of techniques covered by this editorial are,
 Given the first n terms of a power series of f(x), compute the first n terms of the power series of its reciprocal \dfrac{1}{f(x)} in \mathcal{O}(n \log n).
 Given two polynomials f(x) and g(x), where \deg f = n and \deg g < \deg f, perform the polynomial long division f=gq+r in \mathcal{O}(n \log n).
 Given n numbers r_1, r_2, \dots r_n, perform the âbinarysplittingâ algorithm for computing (xr_1)(xr_2)\dots(xr_n) in \mathcal{O}(n \log^2 n).
 Given an n degree polynomial A(x), evaluate it at n distinct points r_1, r_2, \dots r_n in \mathcal{O}(n \log^2 n)
 Given the rational expression \dfrac{1}{(xr_1)(xr_2)\dots(xr_n)} where the r_i are pairwisedistinct, find its partial fraction decomposition in \mathcal{O}(n \log^2 n).
Each technique will be in its own labeled and mostly selfcontained section. For flow of the editorial, the techniques will be discussed in roughly the opposite order that they were listed above.
Finding the Formula
We begin with some preliminary observations.
 The answer for some k_i is actually independent of m. So, for the rest of this editorial, we will assume that the query is m, since the same technique will also work if k_i were any other number.
 Each path from (1,1) to (m,n) passes through exactly m+n1 cells.
 Suppose you counted that you passed through c_1 cells in the first column, then through c_2 cells in the second column, and so on, then through c_n cells in the final column, where 1 \leq c_i \leq m and c_1+c_2+\dots c_n=m+n1.
 Once you âexitâ a column by moving right, you will never reenter that column again.
 Thus, there is a onetoone relation between a path from (1,1) to (m,n) and such a sequence of values (c_1, c_2, \dots c_n).
Thus, the problem can be reduce to evaluating the sum,
where 1 \leq c_i \leq m. If you are familiar with manipulating generating functions, you will recognize that this is
where we use the notation [x^k]f(x) to mean "the coefficient of x^k in f(x)". We can pull out the common a_j x in each multiplicand to get
Since multiplying by x^n merely shifts all the terms in the final product n places up, this is equivalent to finding the coefficient
Now, this is a geometric series. If it were an infinite geometric series, it would have a neat formula, but unfortunately our series are only finite. However, notice that we only need the coefficient of x^{m1} in the final product. So, even if we add extra terms like x^m or x^{m+1} or higher, they will be too large to contribute to the coefficient of a lowerdegree term. So, what we are looking for is in fact equal to
How do we then find the coefficient of x^{m1} in this expression? If we had a simpler case with n=1, then
by the geometric series formula (since we are merely using this as a formal power series and wonât actually evaluate the function at any point, we donât need to worry about convergence). This motivates us to recall a technique from high schoolâpartial fraction decomposition. For reasons which will become clearer later, it will be more convenient to rewrite the above expression as
With partial fraction decomposition, we can rewrite the above product as
where each K_j is some constant coefficient. If we are able to find such K_j for the decomposition, then the answer we want is simply going to be
So, the question isâhow do we efficiently perform a partial fraction decomposition?
Partial Fraction Decomposition
Suppose we want to perform the partial fraction decomposition,
where the r_i are distinct (in our problem, these would be the a_i^{1}). Multiply both sides by \prod (xr_i) to get
Since the r_i are distinct, we can solve for a particular K_k by plugging in x=r_k into the equation. For every term except for when i=k, notice that \prod_{j \neq i}(r_k r_j)=0. So, we are ultimately left with
If we were to naively compute this product for each 1 \leq k \leq n, this would take \mathcal{O}(n^2) time, which is too slow. So, we have to employ a smarter trick. I attribute the following technique to the paper âComplexity of numerical algorithms for polynomialsâ (Chin, 1977). Define
Letâs try to take the derivative of Q(x). We compute for Q'(x) using the product rule,
Notice something neat? What happens when you plug in x=r_k? Similar to the trick shown earlier, all the terms vanish except for when i=k. So, we get
which leads us to the amazing realization that
So, to find each of the K_k, all we have to do is evaluate the polynomial Q'(x) at the n different points r_1, r_2, \dots r_n. So, we now have the following twostep plan.
 Find the explicit coefficients of Q'(x)
 Evaluate Q'(x) at n different points in subquadratic time.
Finding Q'(x) with Binary Splitting
If we knew the coefficients of
then the coefficients of Q'(x) will simply be
So, it suffices for us to be able to find the coefficients of Q(x). Recall that we defined Q(x) to be
If we were to multiply each xr_i onebyone from left to right, we would get an \mathcal{O}(n^2) time complexity. So, we will do something smarter insteadâDivide and Conquer.
Let M=\lfloor \frac{1+n}{2}\rfloor. Then, we define two functions
We notice that computing the coefficients of each of these is a smaller subproblem. How much smaller are the subproblems? Note that \deg Q_1 \approx \frac{n}{2}. Similarly, \deg Q_2 \approx \frac{n}{2}.
Also, once we have the coefficients of Q_1(x) and Q_2(x), we can merge their answers together to get the coefficients of Q(x) since Q(x)=Q_1(x)Q_2(x). This gives us a recurrence of
if we use FFT (an NTT, to be precise) to multiply Q_1(x) and Q_2(x) together. By Master Theorem, this has a running time of T(n)=\mathcal{O}(n \log^2 n). I have heard this technique be referred to as binary splitting in the literature.
Evaluating Q'(x) at n arbitrary points
Fun fact: This is given as a guided exercise in CLRS!
For our problem, we will be evaluating Q'(x), but this technique will work for any n degree polynomial.
Suppose we have some n degree polynomial A(x) which we need to evaluate at the n different points r_1, r_2, \dots r_n. To do so, we will employ our favorite vanishing trick in order to perform some divide and conquer.
Just as in the last section, let
and define Q_1 and Q_2 similarly as
where M=\left\lfloor\frac{1+n}{2}\right\rfloor. Consider some r_k where 1 \leq k \leq M. Due to how it was defined, Q_1(r_k)=0. Similarly, if M+1 \leq k \leq n, then Q_2(r_k)=0. This motivates the following steps.
Notice that the degrees of Q_1 and Q_2 are less than \deg A = n. So, we can use polynomial long division to write
and
where A_1 and A_2 are the unique polynomials such that \deg A_1 < \deg Q_1 and \deg A_2 < \deg Q_2.
What happens then if plug in each r_k into A(x)? Note that if 1 \leq k \leq M, since Q_1(r_k)=0,
and if M+1 \leq k \leq n, since Q_2(r_k)=0,
In summary, A(x) can be defined on the n points as
where A_1 can be thought of as A \bmod Q_1, and A_2 can be thought of as A \bmod Q_2.
So, in order to evaluate A(x) at each of the r_k, we can evaluate A_1(x) at r_1, r_2, \dots r_M, and evaluate A_2(x) at r_{M+1}, r_{M+2}, \dots r_n. These are smaller subproblems, so we apply Divide and Conquer. We are assured that \deg A_1 < \frac{n}{2} and \deg A_2 < \frac{n}{2} (why?)
How do we find Q_1 and Q_2? Well, we already computed them earlier when we were performing binary splitting! We should just modify our binary splitting algorithm so that it actually stores the coefficients of each Q_1 and Q_2 somewhere at each step of the process.
Of course, there is the actual matter of actually finding A_1(x) and A_2(x). We claim for now that polynomial long division can be performed in \mathcal{O}(n \log n).
Thus, we have a recurrence of T(n) = 2T(\frac{n}{2})+\mathcal{O}(n \log n). Again, by Master Theorem, that means our algorithm has a complexity of \mathcal{O}(n \log^2 n).
So, how do we perform polynomial long division in \mathcal{O}(n \log n)?
Polynomial Long Division
Suppose we have two polynomials f and g such that \deg g < \deg f. We wish to find the unique polynomials q and r such that f=gq+r and \deg r < \deg g.
Performing synthetic division will unfortunately be O(n^2) (if n=\deg f). We will need somethig smarter. There are many ways to do polynomial long division, but the following method is what I found to be the easiest to understand, taken from some lecture notes from MIT.
What if, miraculously, r(x)=0? Then, we would have f=gq. From here, it seems natural that the next step is to âdivideâ both sides by g, as if this were just normal algebra. It turns out that this is actually something we can do.
For some polynomial g, letâs define its reciprocal \dfrac{1}{g} to be the polynomial such that g \dfrac{1}{g} = 1. Note here that 1 refers to a function, the constant function that always returns 1, which is the identity in polynomial multiplication. Note that oftentimes the reciprocal is an infinite polynomial. We will later discuss when such a reciprocal exists and how to compute it, but for now let us take it on faith that this is something we can do.
From f=gq, multiply both sides by \dfrac{1}{g} and get \dfrac{1}{g}f = \dfrac{1}{g} gq = 1q = q. This tells us what q is, then finding r will be a simple matter of r=fgq.
This motivates us to introduce some definitions to make the r(x) vanish from the equation.
Definition: We introduce a new operator rev(f), which we read as the âreverseâ of f. Simply put, we âreverseâ the order of the coefficients of f. So, if
then
Exercise: Prove that rev(fg)=rev(f)rev(g) (aka the proof is by straightforward algebraic manipulations which I donât feel like typing out).
This turns out to be pretty important, because we then have
if c_k are the coefficients of r(x). Note that rev(gq+r)\neq rev(g)rev(q)+rev(r) because we have to reverse r with respect to the larger \deg f.
Now, by the definition of polynomial long division, \deg r < \deg g. So, \deg f\deg r > \deg f  \deg g. The degrees of the terms in the sum on the right are thus rather largeâeach has degree at least \deg f  \deg g+1. We can make them vanish with the following definition.
Definition: Let f and g be polynomials. We say that f\equiv g \bmod x^m if the coefficients of f and g agree at the first m terms. Since we start counting from 0, that means their coefficients must be equal from x^0 to x^{m1}.
Since all the terms in that sum on the right are \deg f  \deg g+1 or larger, we have
and so we are free to do
where \dfrac{1}{rev(g)} is the polynomial such that rev(g) \dfrac{1}{rev(g)} \equiv 1 \bmod x^{\deg f  \deg g +1}. Recall earlier that we said that the reciprocal could possibly be an infinite polynomial. However, we do not need all infinitely many termsâknowing its first few terms will already suffice, since we are working with a modulo. We will expound more on how to actually find this reciprocal later.
Now, note that \deg q =\deg f  \deg g. This means that q (and thus rev(q)) have no terms with degree \deg f  \deg g +1 or greater. Thus, if we compute
then the result that we get is exactly rev(q). We can reverse this again to recover q.
Again, if \deg f = n, then each reverse can be done in \mathcal{O}(n). I claim that \dfrac{1}{rev(g)} \bmod x^{\deg f  \deg g+1} can be computed in \mathcal{O}(n \log n). Finally, the product rev(f)\dfrac{1}{rev(g)} \mod x^{\deg f  \deg g+1} can be computed in \mathcal{O}(n \log n) as well using FFT.
So, how do we compute \dfrac{1}{rev(g)} \bmod x^{\deg f  \deg g+1}?
Finding a reciprocal \bmod x^n
Given the first n terms of some polynomial G(x), we say that R_n(x)=\dfrac{1}{G(x)} \bmod x^n if R_n(x) is the polynomial such that
Note that the 1 there is the constant function \textbf{1}(x)=1. I claim that such an R_n(x) exists for all G(x) where G(0) \neq 0.
I first encountered the following technique from the editorial of the Codeforces problem The Child and Binary Tree. The technique is reminiscent of âliftingâ algorithms. We will find a solution in some modulus, and find a way to âliftâ that solution up to a greater mod.
In particular, if we know R_k(x), our algorithm will produce for us the function R_{2k}(x).
Exercise: Find the value of R_1(x). Hint: Itâs constant.
Now, suppose we have some R_k(x) such that
Rearranging,
Since here, the term of smallest degree with a nonzero coefficient is at least x^k, when we square it,
we are certain that the term of smallest degree with a nonzero coefficient is now at least x^{2k}. Expanding, we get
By the definition then, we see that
Since we can easily procure the starting point R_1(x), we can iterate this algorithm as many times as necessary until we reach a power of 2 that is at least n. If 2^k \geq n, then R_{2^k}(x) \equiv R_n(x) \mod x^n (why?).
What is the runtime of this algorithm? At each step, we have to perform a constant number of multiplications. Crucially, note that computing R_{k}(x) only requires multiplying the first k terms of all involved polynomials (since it will be reduced \bmod x^k anyway), so we only perform \mathcal{O}(k \log k) operations at each step.
Thus, we have a running time of T(n)=T(\frac{n}{2})+\mathcal{O}(n \log n), which by Master Theorem has a complexity of \mathcal{O}(n \log n).
Fun Exercise: Use this idea to create an algorithm which finds the first n terms of \sqrt{G(x)} in \mathcal{O}(n \log^2 n).
In summary
 We can multiply two polynomials in \mathcal{O}(n \log n) using FFT
 With this, we can find the reciprocal of a polynomial \bmod x^n in \mathcal{O}(n \log n) using a lifting algorithm.
 With this, we can do polynomial long division in \mathcal{O}(n \log n) using the reversing trick.
 We can evaluate (xr_1)(xr_2)\dots(xr_n) in \mathcal{O}(n \log ^2 n) using binary splitting.
 With these, we can evaluate a degree n polynomial at n arbitrary points in \mathcal{O}(n \log^2 n) using a Divide and Conquer algorithm.
 With this, we can compute the partial fraction decomposition of n different linear terms in \mathcal{O}(n \log^2 n) with Chinâs derivative trick.
 With this, we can find the answer to one query in \mathcal{O}(n \log m)
What if there are a lot of queries?
If we used the above algorithm to answer all queries, thenâincluding preprocessingâwe would have a time complexity of \mathcal{O}(n \log^2 n +Q n \log m), which will pass the third and fourth subtasks. However, this actually wonât pass the second (and possibly even the first!) subtask because there are too many queries.
However, notice that in the first and second subtasks, m \leq 10^5. For these cases, we donât even need to bother with the partial fraction decomposition at all!
Recall that what we want to find is the coefficient of x^{m1} in \dfrac{(1)^n}{Q(x)}. What you could do instead then is compute Q(x) in \mathcal{O}(n \log^2 n) using binary splitting, then find the first m terms of \dfrac{1}{Q(x)} in \mathcal{O}(m \log m) with the reciprocalfinding algorithm.
Note that this already produces the coefficients of the first m terms, so we already have the answer for all k in 1\leq k \leq m, all at once.
With this solution, we can answer all queries in \mathcal{O}(1) because it becomes a simple array lookup. This method gives a complexity of \mathcal{O}(n\log^2 n+m \log m +Q), which will pass the first two subtasks.
We can combine this with our first solution with an if statement that checks if m is small enough to do this.
Final Words
If you made it this far, congratulations! I hope you learned something new about polynomials. This editorial is more comprehensive and elaborate because I wanted there to be a good reference for these techniques. I wanted to give something back to the competitive programming community, since I had learned so much from the generous blog posts of others.
I love how each of the techniques builds on each other, each with its own interesting trick. I hope to see more generating function and polynomial manipulation problems in contests!
The original version of this problem had a_j = j, i.e. \{a_k\} = 1, 2, 3, \dots n. Its solution noted a relationship with Stirling Numbers of the Second Kind in order to solve it subquadratically. Alei asked me to make the problem more algorithmic, so I changed it to the current version of the problem.
When I first wrote this problem, I did not know how to do partial fractions or polynomial long divisionâI only learned those with the help of Google. I also theoretically knew about multipoint evaluation due to CLRS but this was my first time implementing it. This was just as much a learning experience for me, but it seems that a lot of 7â participants already had these techniques prepared in their libraries!
Thanks for reading, and I hope you enjoyed this editorial :))
SOLUTIONS:
Setter's Solution
#include<iostream>
#include<stack>
#include<cassert>
#include<cstring>
#include<algorithm>
using namespace std;
typedef long long LL;
const int DEPTH = 19;
const int ARR_DEPTH = (1<<DEPTH); //approx 5x10^5
const int SZ = 12;
LL temp[SZ][ARR_DEPTH+1];
const LL MOD = 998244353;
struct GF_Manager
{
int tC = 0;
stack<int> to_be_freed;
const static LL DEPTH = 23;
LL primitives[DEPTH+1];
LL primitives_inv[DEPTH+1];
LL two_inv[DEPTH+1];
LL mod_pow(LL base, LL exp)
{
if(exp==0) return 1;
if(exp&1) return (base*mod_pow(base,exp1))%MOD;
else return mod_pow((base*base)%MOD, exp/2);
}
void set_up_primitives()
{
primitives[DEPTH] = 31;
primitives_inv[DEPTH] = mod_pow(primitives[DEPTH], MOD2);
two_inv[DEPTH] = mod_pow(1<<DEPTH,MOD2);
for(int n = DEPTH1; n >= 0; n)
primitives[n] = (primitives[n+1]*primitives[n+1])%MOD,
primitives_inv[n] = mod_pow(primitives[n],MOD2),
two_inv[n] = mod_pow(1<<n,MOD2);
}
GF_Manager(){ set_up_primitives(); }
void start_claiming(){ to_be_freed.push(0); }
LL* claim(){ ++to_be_freed.top(); assert(tC < SZ); return temp[tC++]; }
void end_claiming(){ tC = to_be_freed.top(); to_be_freed.pop(); }
void clean(LL A[], int an, bool deep_cleaning=false)
{
if(deep_cleaning)
for(int i = 0; i < an; i++)
A[i] %= MOD;
for(int i = 0; i < an; i++)
if(A[i] < 0)
A[i] += MOD;
}
void NTT(LL A[], int n, LL t[], bool is_inverse=false, int offset=0)
{
if(n==0)
return;
//Put the evens first, then the odds
for(int i = 0; i < (1<<(n1)); i++)
{
t[i] = A[offset+2*i];
t[i+(1<<(n1))] = A[offset+2*i+1];
}
for(int i = 0; i < (1<<n); i++)
A[offset+i] = t[i];
NTT(A,n1,t,is_inverse,offset);
NTT(A,n1,t,is_inverse,offset+(1<<(n1)));
LL w1 = is_inverse ? primitives_inv[n] : primitives[n];
LL w = 1;
for(int i = 0; i < (1<<(n1)); i++, w=(w*w1)%MOD)
{
t[i] = (A[offset+i]+w*A[offset+(1<<(n1))+i])%MOD;
t[i+(1<<(n1))] = (A[offset+i]w*A[offset+(1<<(n1))+i])%MOD;
}
for(int i = 0; i < (1<<n); i++)
A[offset+i] = t[i];
}
int add(LL A[], int an, LL B[], int bn, LL C[])
{
int cn = 0;
for(int i = 0; i < max(an,bn); i++)
{
C[i] = A[i]+B[i];
if(C[i] <= MOD)
C[i] += MOD;
if(MOD <= C[i])
C[i] = MOD;
if(C[i]!=0)
cn = i;
}
return cn;
}
int subtract(LL A[], int an, LL B[], int bn, LL C[])
{
int cn = 0;
for(int i = 0; i < max(an,bn); i++)
{
C[i] = A[i]B[i];
if(C[i] <= MOD)
C[i] += MOD;
if(MOD <= C[i])
C[i] = MOD;
if(C[i]!=0)
cn = i;
}
return cn+1;
}
int scalar_mult(LL v, LL A[], int an, LL C[])
{
for(int i = 0; i < an; i++)
C[i] = (v*A[i])%MOD;
return v==0 ? 0 : an;
}
int mult(LL A[], int an, LL B[], int bn, LL C[])
{
start_claiming();
//make sure you've called setup primitives first
//note: an and bn refer to the *number of items in each array*, NOT the degree of the largest term
int n, degree = an+bn1;
for(n=0; (1<<n) < degree; n++);
LL* tA = claim();
LL* tB = claim();
LL* t = claim();
copy(A,A+an,tA); fill(tA+an,tA+(1<<n),0);
copy(B,B+bn,tB); fill(tB+bn,tB+(1<<n),0);
NTT(tA,n,t);
NTT(tB,n,t);
for(int i = 0; i < (1<<n); i++)
tA[i] = (tA[i]*tB[i])%MOD;
NTT(tA,n,t,true);
scalar_mult(two_inv[n],tA,degree,C);
end_claiming();
return degree;
}
int reciprocal(LL F[], int fn, LL R[])
{
start_claiming();
LL* tR = claim();
LL* tempR = claim();
int n;
for(n=0; (1<<n) < fn; n++);
fill(tempR,tempR+(1<<n),0);
tempR[0] = mod_pow(F[0],MOD2);
for(int i = 1; i <= n; i++)
{
mult(tempR,1<<i,F,1<<i,tR);
tR[0] = 2;
scalar_mult(1,tR,1<<i,tR);
mult(tempR,1<<i,tR,1<<i,tempR);
}
copy(tempR,tempR+fn,R);
end_claiming();
return n;
}
int quotient(LL F[], int fn, LL G[], int gn, LL Q[])
{
start_claiming();
LL* revF = claim();
LL* revG = claim();
LL* tempQ = claim();
copy(F,F+fn,revF); reverse(revF,revF+fn);
copy(G,G+gn,revG); reverse(revG,revG+gn);
int qn = fngn+1;
reciprocal(revG,qn,revG);
mult(revF,qn,revG,qn,tempQ);
reverse(tempQ,tempQ+qn);
copy(tempQ,tempQ+qn,Q);
end_claiming();
return qn;
}
int mod(LL F[], int fn, LL G[], int gn, LL R[])
{
start_claiming();
LL* Q = claim();
LL* GQ = claim();
int qn = quotient(F,fn,G,gn,Q);
int gqn = mult(G,gn,Q,qn,GQ);
int rn = subtract(F,fn,GQ,gqn,R);
end_claiming();
return rn;
}
int derivative(LL F[], int fn, LL F_[])
{
for(LL i = 0; i < fn1; i++)
F_[i] = ((i+1)*F[i+1])%MOD;
return fn1;
}
LL horners(LL F[], int fn, LL xi)
{
LL ans = 0;
for(int i = fn1; i >= 0; i)
ans = (ans*xi+F[i])%MOD;
return ans;
}
};
GF_Manager gfManager;
LL split[DEPTH+1][2*(ARR_DEPTH)+1];
LL Fi[DEPTH+1][2*(ARR_DEPTH)+1];
int establish_binary_splitting(LL a[], int l, int r, int s=0, int offset=0)
{
if(l==r)
{
split[s][offset] = a[l]; //x^0
split[s][offset+1] = 1; //x^1
return 2;
}
int m = (l+r)/2;
int sz = ml+1;
int da = establish_binary_splitting(a,l,m,s+1,offset);
int db = establish_binary_splitting(a,m+1,r,s+1,offset+(sz<<1));
return gfManager.mult(split[s+1]+offset,da,split[s+1]+offset+(sz<<1),db,split[s]+offset);
}
void multipoint_eval(LL a[], int l, int r, LL F[], int fn, LL ans[], int s=0, int offset=0)
{
if(l==r)
{
ans[l] = gfManager.horners(F,fn,a[l]);
return;
}
int m = (l+r)/2;
int sz = ml+1;
int da = gfManager.mod(F,fn,split[s+1]+offset,sz+1, Fi[s]+offset);
int db = gfManager.mod(F,fn,split[s+1]+offset+(sz<<1),rm+1, Fi[s]+offset+(sz<<1));
multipoint_eval(a,l,m,Fi[s]+offset,da,ans,s+1,offset);
multipoint_eval(a,m+1,r,Fi[s]+offset+(sz<<1),db,ans,s+1,offset+(sz<<1));
}
LL mod_pow(LL base, LL exp)
{
if(exp==0) return 1;
if(exp&1) return (base*mod_pow(base,exp1))%MOD;
else return mod_pow((base*base)%MOD, exp>>1);
}
const int N = 1e5;
LL a[N];
LL a_inv[N];
LL Q[N+1];
LL Q_[N];
LL K[N];
int main()
{
ios::sync_with_stdio(false);
cin.tie(0);
cout.tie(0);
LL m, n, queries; cin >> m >> n >> queries;
for(int i = 0; i < n; i++)
{
cin >> a[i];
a_inv[i] = mod_pow(a[i],MOD2);
}
establish_binary_splitting(a_inv,0,n1);
memset(Q,0,sizeof Q);
copy(split[0],split[0]+n+1,Q);
if(m <= N)
{
gfManager.reciprocal(Q,m,Q);
if(n&1)
gfManager.scalar_mult(1,Q,m,Q);
gfManager.clean(Q,m);
while(queries)
{
int k; cin >> k;
cout<<Q[k1]<<'\n';
}
}
else
{
gfManager.derivative(Q,n+1,Q_);
multipoint_eval(a_inv,0,n1,Q_,n,K);
for(int i = 0; i < n; i++)
K[i] = mod_pow(K[i],MOD2);
if((n+1)&1)
gfManager.scalar_mult(1,K,n,K);
gfManager.clean(K,n);
while(queries)
{
LL k; cin >> k;
LL ans = 0;
for(int i = 0; i < n; i++)
ans = (ans+K[i]*mod_pow(a[i],k))%MOD;
cout<<ans<<'\n';
}
}
cout<<flush;
return 0;
}