PROBLEM LINK:
Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4
Author: tibinyte
Testers: tabr, iceknight1093
Editorialist: iceknight1093
DIFFICULTY:
TBD
PREREQUISITES:
Combinatorics, the inclusion-exclusion principle, polynomial convolution with arbitrary mod
PROBLEM:
A permutation P of length 2N is called N-beautiful if the N pairs (P_i\bmod N, P_{i+N}\bmod N) are all distinct.
Given K, For each 1 \leq i \leq K find the number of i-beautiful permutations, modulo a given prime number M.
EXPLANATION:
Weāll first solve subtask 1, and then extend that solution to subtask 2.
Subtask 1
The limit on K is small enough here that \mathcal{O}(K^2) will pass.
In particular, we can solve for a fixed N in \mathcal{O}(N) time; so letās attempt to do that.
Note that we only really care about the values of P_i \bmod N, which range from 0 to N-1. Also, we have exactly two of each value.
Weāll call arrays of length 2N containing each integer from 0 to N-1 twice good arrays.
Letās call a permutation that isnāt N-beautiful, N-ugly.
Weāll try to count the number of N-ugly permutations.
Suppose we have a N-ugly permutation.
Then, there exist two indices 1 \leq i \lt j \leq N such that P_i = P_j and P_{i+N} = P_{j+N}.
Such a pair will be called a bad pair.
In particular, note that these four positions use all copies of P_i and P_{i+N}, so the rest of the elements are simply a good array of length 2N-4 (after appropriate relabelling).
Itās not too hard to count the number of such arrays, since fixing i, j, the values P_i and P_{i+N}, and which copy of each value goes to which position is simple math and a bunch of multiplications.
However, we canāt directly subtract this count from (2N)!, since a good array with multiple bad pairs was counted multiple times!
This should lead you to the idea of using the inclusion-exclusion principle.
Let B_i be the number of good arrays with at least i bad pairs.
By the principle of inclusion-exclusion, the answer weāre looking for is
So, letās focus on computing B_i quickly.
This is a generalization of how we computed B_1, and is just a bunch of combinatorics.
Details
There are i bad pairs, which fixes 2i positions among the first N. These can be chosen in \binom{N}{2i} ways.
We need to now split these 2i positions into i pairs.
This can be done in \binom{2i}{2, 2, 2, \ldots, 2} = \frac{(2i)!}{2^i} ways by definition of the multinomial coefficient.
Next, we need to fix which elements go into each pair. Letās fix an order of the pairs and assign values one by one.
This can be done in N\cdot (N-1)\cdot (N-2) \cdot \ldots \cdot (N-2i+1) ways, which after a little algebraic manipulation is seen to be just \binom{N}{2i} (2i)!.
However, we also need to divide by i! since the order in which we assign values to the pairs doesnāt actually matter (it makes no difference to the final permutation).
Each bad pair now has four choices (we can swap elements with the same value modulo N to obtain a different permutation with the same bad pair count).
This gives us 4^i choices in total.
Finally, there are (2N - 4i)! ways to arrange the elements not among these bad pairs.
Putting everything together, we have
which can be computed in \mathcal{O}(\log N) or even \mathcal{O}(1) time, since everything is either a factorial or a power of two (or an inverse of one of them)
Since each B_i can be computed quickly, we can solve for a fixed N in \mathcal{O}(N) time, which is good enough for subtask 1.
Subtask 2
Letās look at our expression for B_i again, and clean it up a bit by cancelling out common terms.
We obtain
The N! terms are independent of i, so letās ignore them for now.
Notice that we have the product of an āi-partā comprising \frac{2^i}{i!} and something like an "N-i part" comprising \binom{2N - 4i}{N - 2i}.
This should make you think along the lines of convolution.
However, we canāt directly multiply two polynomials: the N-i part doesnāt quite depend on N-i, it depends on N - 2i (and thereās no apparent nice way to make the i-part depend on 2i instead).
Itād be nice if we had 2N instead of N, so letās do just that!
We split the computation into two cases: even N and odd N. Letās deal with the even case first.
So, suppose N = 2n.
We now have \displaystyle\binom{2N - 4i}{N - 2i} = \binom{4n - 4i}{2n - 2i} = \binom{4(n - i)}{2(n-i)}, which does depend on n - i.
So, create the polynomials p and q, with coefficients
- p_i =\frac{ (-1)^i 2^i}{i!}
- q_i = \binom{4i}{2i}
and multiply them together so obtain r.
The answer for 2n is then r_n, multiplied by (2n)!^2 (which we took out at the start).
For odd N, a similar technique works: write N = 2n + 1, and once again youāll get a very similar relation on n-i. Following this, multiply the appropriate polynomials and weāre done.
Note that this problem requires multiplying two polynomials and keeping their coefficients modulo some M, where M isnāt necessarily āniceā.
Standard FFT and NTT wonāt suffice here; a tutorial on how to do this can be found at cp-algorithms.
TIME COMPLEXITY
\mathcal{O}(K\log K) per testcase.
CODE:
Setter's code (C++)
#include <bits/stdc++.h>
using namespace std;
int n, mod;
struct modint
{
int32_t value;
modint() = default;
modint(int32_t value_) : value(value_ % mod) {}
modint(int64_t value_) : value(value_ % mod) {}
inline modint operator+(modint other) const
{
int32_t c = this->value + other.value;
return modint(c >= mod ? c - mod : c);
}
inline modint operator-(modint other) const
{
int32_t c = this->value - other.value;
return modint(c < 0 ? c + mod : c);
}
inline modint operator*(modint other) const
{
int32_t c = (int64_t)this->value * other.value % mod;
return modint(c < 0 ? c + mod : c);
}
inline modint &operator+=(modint other)
{
this->value += other.value;
if (this->value >= mod)
this->value -= mod;
return *this;
}
inline modint &operator-=(modint other)
{
this->value -= other.value;
if (this->value < 0)
this->value += mod;
return *this;
}
inline modint &operator*=(modint other)
{
this->value = (int64_t)this->value * other.value % mod;
if (this->value < 0)
this->value += mod;
return *this;
}
inline modint operator-() const { return modint(this->value ? mod - this->value : 0); }
modint pow(int32_t k) const
{
modint x = *this, y = 1;
for (; k; k >>= 1)
{
if (k & 1)
y *= x;
x *= x;
}
return y;
}
modint inv() const { return pow(mod - 2); } // MOD must be a prime
inline modint operator/(modint other) const { return *this * other.inv(); }
inline modint operator/=(modint other) { return *this *= other.inv(); }
inline bool operator==(modint other) const { return value == other.value; }
inline bool operator!=(modint other) const { return value != other.value; }
inline bool operator<(modint other) const { return value < other.value; }
inline bool operator>(modint other) const { return value > other.value; }
};
modint operator*(int64_t value, modint n) { return modint(value) * n; }
modint operator*(int32_t value, modint n) { return modint(value) * n; }
istream &operator>>(istream &in, modint &n) { return in >> n.value; }
ostream &operator<<(ostream &out, modint n) { return out << n.value; }
struct combi
{
int n;
vector<modint> facts, finvs, invs;
combi(int _n) : n(_n), facts(_n), finvs(_n), invs(_n)
{
facts[0] = finvs[0] = 1;
invs[1] = 1;
for (int i = 2; i < n; i++)
invs[i] = invs[mod % i] * (-mod / i);
for (int i = 1; i < n; i++)
{
facts[i] = facts[i - 1] * i;
finvs[i] = finvs[i - 1] * invs[i];
}
}
inline modint fact(int n) { return facts[n]; }
inline modint finv(int n) { return finvs[n]; }
inline modint inv(int n) { return invs[n]; }
inline modint ncr(int n, int k) { return n < k or k < 0 ? 0 : facts[n] * finvs[k] * finvs[n - k]; }
inline modint aranj(int n, int k) { return ncr(n, k) * facts[k]; }
};
struct base
{
double x, y;
base() { x = y = 0; }
base(double x, double y) : x(x), y(y) {}
};
inline base operator+(base a, base b) { return base(a.x + b.x, a.y + b.y); }
inline base operator-(base a, base b) { return base(a.x - b.x, a.y - b.y); }
inline base operator*(base a, base b) { return base(a.x * b.x - a.y * b.y, a.x * b.y + a.y * b.x); }
inline base conj(base a) { return base(a.x, -a.y); }
int lim = 1;
vector<base> roots = {{0, 0}, {1, 0}};
vector<int> rev = {0, 1};
const double PI = acosl(-1.0);
void ensure_base(int p)
{
if (p <= lim)
return;
rev.resize(1 << p);
for (int i = 0; i < (1 << p); i++)
rev[i] = (rev[i >> 1] >> 1) + ((i & 1) << (p - 1));
roots.resize(1 << p);
while (lim < p)
{
double angle = 2 * PI / (1 << (lim + 1));
for (int i = 1 << (lim - 1); i < (1 << lim); i++)
{
roots[i << 1] = roots[i];
double angle_i = angle * (2 * i + 1 - (1 << lim));
roots[(i << 1) + 1] = base(cos(angle_i), sin(angle_i));
}
lim++;
}
}
void fft(vector<base> &a, int n = -1)
{
if (n == -1)
n = a.size();
assert((n & (n - 1)) == 0);
int zeros = __builtin_ctz(n);
ensure_base(zeros);
int shift = lim - zeros;
for (int i = 0; i < n; i++)
if (i < (rev[i] >> shift))
swap(a[i], a[rev[i] >> shift]);
for (int k = 1; k < n; k <<= 1)
{
for (int i = 0; i < n; i += 2 * k)
{
for (int j = 0; j < k; j++)
{
base z = a[i + j + k] * roots[j + k];
a[i + j + k] = a[i + j] - z;
a[i + j] = a[i + j] + z;
}
}
}
}
vector<int> multiply(vector<int> &a, vector<int> &b, int eq = 0)
{
int need = a.size() + b.size() - 1;
int p = 0;
while ((1 << p) < need)
p++;
ensure_base(p);
int sz = 1 << p;
vector<base> A, B;
if (sz > (int)A.size())
A.resize(sz);
for (int i = 0; i < (int)a.size(); i++)
{
int x = (a[i] % mod + mod) % mod;
A[i] = base(x & ((1 << 15) - 1), x >> 15);
}
fill(A.begin() + a.size(), A.begin() + sz, base{0, 0});
fft(A, sz);
if (sz > (int)B.size())
B.resize(sz);
if (eq)
copy(A.begin(), A.begin() + sz, B.begin());
else
{
for (int i = 0; i < (int)b.size(); i++)
{
int x = (b[i] % mod + mod) % mod;
B[i] = base(x & ((1 << 15) - 1), x >> 15);
}
fill(B.begin() + b.size(), B.begin() + sz, base{0, 0});
fft(B, sz);
}
double ratio = 0.25 / sz;
base r2(0, -1), r3(ratio, 0), r4(0, -ratio), r5(0, 1);
for (int i = 0; i <= (sz >> 1); i++)
{
int j = (sz - i) & (sz - 1);
base a1 = (A[i] + conj(A[j])), a2 = (A[i] - conj(A[j])) * r2;
base b1 = (B[i] + conj(B[j])) * r3, b2 = (B[i] - conj(B[j])) * r4;
if (i != j)
{
base c1 = (A[j] + conj(A[i])), c2 = (A[j] - conj(A[i])) * r2;
base d1 = (B[j] + conj(B[i])) * r3, d2 = (B[j] - conj(B[i])) * r4;
A[i] = c1 * d1 + c2 * d2 * r5;
B[i] = c1 * d2 + c2 * d1;
}
A[j] = a1 * b1 + a2 * b2 * r5;
B[j] = a1 * b2 + a2 * b1;
}
fft(A, sz);
fft(B, sz);
vector<int> res(need);
for (int i = 0; i < need; i++)
{
long long aa = A[i].x + 0.5;
long long bb = B[i].x + 0.5;
long long cc = A[i].y + 0.5;
res[i] = (aa + ((bb % mod) << 15) + ((cc % mod) << 30)) % mod;
}
return res;
}
int main()
{
cin.tie(nullptr)->sync_with_stdio(false);
cin >> n >> mod;
combi C(n + n + 1);
int lim = n / 2;
vector<int> first(lim + 1);
for (int i = 0; i <= lim; ++i)
{
modint rep = C.fact(4 * i) * C.finv(2 * i) * C.finv(2 * i);
first[i] = rep.value;
}
vector<int> second(lim + 1);
for (int i = 0; i <= lim; ++i)
{
modint rep = modint(2).pow(i) * modint(-1).pow(i) * C.finv(i);
second[i] = rep.value;
}
vector<int> sumeven = multiply(first, second);
first = vector<int>(lim + 2);
for (int i = 1; i <= lim + 1; ++i)
{
modint rep = C.fact(4 * i - 2) * C.finv(2 * i - 1) * C.finv(2 * i - 1);
first[i] = rep.value;
}
vector<int> sumodd = multiply(first, second);
for (int i = 1; i <= n; ++i)
{
if (i % 2 == 1)
{
modint rep = C.fact(i) * C.fact(i) * sumodd[(i + 1) / 2];
cout << rep << ' ';
}
else
{
modint rep = C.fact(i) * C.fact(i) * sumeven[i / 2];
cout << rep << ' ';
}
}
}
Tester's code (C++)
#include <bits/stdc++.h>
using namespace std;
#ifdef tabr
#include "library/debug.cpp"
#else
#define debug(...)
#endif
struct input_checker {
string buffer;
int pos;
const string all = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
const string number = "0123456789";
const string upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
const string lower = "abcdefghijklmnopqrstuvwxyz";
input_checker() {
pos = 0;
while (true) {
int c = cin.get();
if (c == -1) {
break;
}
buffer.push_back((char) c);
}
}
string readOne() {
assert(pos < (int) buffer.size());
string res;
while (pos < (int) buffer.size() && buffer[pos] != ' ' && buffer[pos] != '\n') {
res += buffer[pos];
pos++;
}
return res;
}
string readString(int min_len, int max_len, const string& pattern = "") {
assert(min_len <= max_len);
string res = readOne();
assert(min_len <= (int) res.size());
assert((int) res.size() <= max_len);
for (int i = 0; i < (int) res.size(); i++) {
assert(pattern.empty() || pattern.find(res[i]) != string::npos);
}
return res;
}
int readInt(int min_val, int max_val) {
assert(min_val <= max_val);
int res = stoi(readOne());
assert(min_val <= res);
assert(res <= max_val);
return res;
}
long long readLong(long long min_val, long long max_val) {
assert(min_val <= max_val);
long long res = stoll(readOne());
assert(min_val <= res);
assert(res <= max_val);
return res;
}
vector<int> readInts(int size, int min_val, int max_val) {
assert(min_val <= max_val);
vector<int> res(size);
for (int i = 0; i < size; i++) {
res[i] = readInt(min_val, max_val);
if (i != size - 1) {
readSpace();
}
}
return res;
}
vector<long long> readLongs(int size, long long min_val, long long max_val) {
assert(min_val <= max_val);
vector<long long> res(size);
for (int i = 0; i < size; i++) {
res[i] = readLong(min_val, max_val);
if (i != size - 1) {
readSpace();
}
}
return res;
}
void readSpace() {
assert((int) buffer.size() > pos);
assert(buffer[pos] == ' ');
pos++;
}
void readEoln() {
assert((int) buffer.size() > pos);
assert(buffer[pos] == '\n');
pos++;
}
void readEof() {
assert((int) buffer.size() == pos);
}
};
namespace math314 {
typedef long long ll;
typedef pair<int, int> Pii;
#define FOR(i, n) for (int i = 0; i < (n); i++)
#define sz(c) ((int) (c).size())
#define ten(x) ((int) 1e##x)
template <class T>
T extgcd(T a, T b, T& x, T& y) {
for (T u = y = 1, v = x = 0; a;) {
T q = b / a;
swap(x -= q * u, u);
swap(y -= q * v, v);
swap(b -= q * a, a);
}
return b;
}
template <class T>
T mod_inv(T a, T m) {
T x, y;
extgcd(a, m, x, y);
return (m + x % m) % m;
}
ll mod_pow(ll a, ll n, ll mod) {
ll ret = 1;
ll p = a % mod;
while (n) {
if (n & 1) ret = ret * p % mod;
p = p * p % mod;
n >>= 1;
}
return ret;
}
template <int mod, int primitive_root>
class NTT {
public:
int get_mod() const { return mod; }
void _ntt(vector<ll>& a, int sign) {
const int n = sz(a);
assert((n ^ (n & -n)) == 0); // n = 2^k
const int g = 3; // g is primitive root of mod
int h = (int) mod_pow(g, (mod - 1) / n, mod); // h^n = 1
if (sign == -1) h = (int) mod_inv(h, mod); // h = h^-1 % mod
// bit reverse
int i = 0;
for (int j = 1; j < n - 1; ++j) {
for (int k = n >> 1; k > (i ^= k); k >>= 1)
;
if (j < i) swap(a[i], a[j]);
}
for (int m = 1; m < n; m *= 2) {
const int m2 = 2 * m;
const ll base = mod_pow(h, n / m2, mod);
ll w = 1;
FOR(x, m) {
for (int s = x; s < n; s += m2) {
ll u = a[s];
ll d = a[s + m] * w % mod;
a[s] = u + d;
if (a[s] >= mod) a[s] -= mod;
a[s + m] = u - d;
if (a[s + m] < 0) a[s + m] += mod;
}
w = w * base % mod;
}
}
for (auto& x : a)
if (x < 0) x += mod;
}
void ntt(vector<ll>& input) {
_ntt(input, 1);
}
void intt(vector<ll>& input) {
_ntt(input, -1);
const int n_inv = mod_inv(sz(input), mod);
for (auto& x : input) x = x * n_inv % mod;
}
// ē³ćæč¾¼ćæę¼ē®ćč”ć
vector<ll> convolution(const vector<ll>& a, const vector<ll>& b) {
int ntt_size = 1;
while (ntt_size < sz(a) + sz(b)) ntt_size *= 2;
vector<ll> _a = a, _b = b;
_a.resize(ntt_size);
_b.resize(ntt_size);
ntt(_a);
ntt(_b);
FOR(i, ntt_size) {
(_a[i] *= _b[i]) %= mod;
}
intt(_a);
return _a;
}
};
ll garner(vector<Pii> mr, int mod) {
mr.emplace_back(mod, 0);
vector<ll> coffs(sz(mr), 1);
vector<ll> constants(sz(mr), 0);
FOR(i, sz(mr) - 1) {
// coffs[i] * v + constants[i] == mr[i].second (mod mr[i].first) ćč§£ć
ll v = (mr[i].second - constants[i]) * mod_inv<ll>(coffs[i], mr[i].first) % mr[i].first;
if (v < 0) v += mr[i].first;
for (int j = i + 1; j < sz(mr); j++) {
(constants[j] += coffs[j] * v) %= mr[j].first;
(coffs[j] *= mr[i].first) %= mr[j].first;
}
}
return constants[sz(mr) - 1];
}
typedef NTT<167772161, 3> NTT_1;
typedef NTT<469762049, 3> NTT_2;
typedef NTT<1224736769, 3> NTT_3;
vector<ll> int32mod_convolution(vector<ll> a, vector<ll> b, int mod) {
for (auto& x : a) x %= mod;
for (auto& x : b) x %= mod;
NTT_1 ntt1;
NTT_2 ntt2;
NTT_3 ntt3;
auto x = ntt1.convolution(a, b);
auto y = ntt2.convolution(a, b);
auto z = ntt3.convolution(a, b);
vector<ll> ret(sz(x));
vector<Pii> mr(3);
FOR(i, sz(x)) {
mr[0].first = ntt1.get_mod(), mr[0].second = (int) x[i];
mr[1].first = ntt2.get_mod(), mr[1].second = (int) y[i];
mr[2].first = ntt3.get_mod(), mr[2].second = (int) z[i];
ret[i] = garner(mr, mod);
}
return ret;
}
} // namespace math314
long long mod = 2;
struct mint {
long long value;
mint(long long x = 0) {
value = x % mod;
if (value < 0) value += mod;
}
mint& operator+=(const mint& other) {
if ((value += other.value) >= mod) value -= mod;
return *this;
}
mint& operator-=(const mint& other) {
if ((value -= other.value) < 0) value += mod;
return *this;
}
mint& operator*=(const mint& other) {
value = value * other.value % mod;
return *this;
}
mint& operator/=(const mint& other) {
long long a = 0, b = 1, c = other.value, m = mod;
while (c != 0) {
long long t = m / c;
m -= t * c;
swap(c, m);
a -= t * b;
swap(a, b);
}
a %= mod;
if (a < 0) a += mod;
value = value * a % mod;
return *this;
}
friend mint operator+(const mint& lhs, const mint& rhs) { return mint(lhs) += rhs; }
friend mint operator-(const mint& lhs, const mint& rhs) { return mint(lhs) -= rhs; }
friend mint operator*(const mint& lhs, const mint& rhs) { return mint(lhs) *= rhs; }
friend mint operator/(const mint& lhs, const mint& rhs) { return mint(lhs) /= rhs; }
mint& operator++() { return *this += 1; }
mint& operator--() { return *this -= 1; }
mint operator++(int) {
mint result(*this);
*this += 1;
return result;
}
mint operator--(int) {
mint result(*this);
*this -= 1;
return result;
}
mint operator-() const { return mint(-value); }
bool operator==(const mint& rhs) const { return value == rhs.value; }
bool operator!=(const mint& rhs) const { return value != rhs.value; }
bool operator<(const mint& rhs) const { return value < rhs.value; }
};
string to_string(const mint& x) {
return to_string(x.value);
}
ostream& operator<<(ostream& stream, const mint& x) {
return stream << x.value;
}
istream& operator>>(istream& stream, mint& x) {
stream >> x.value;
x.value %= mod;
if (x.value < 0) x.value += mod;
return stream;
}
mint power(mint a, long long n) {
mint res = 1;
while (n > 0) {
if (n & 1) {
res *= a;
}
a *= a;
n >>= 1;
}
return res;
}
vector<mint> fact(1, 1);
vector<mint> finv(1, 1);
mint C(int n, int k) {
if (n < k || k < 0) {
return mint(0);
}
while ((int) fact.size() < n + 1) {
fact.emplace_back(fact.back() * (int) fact.size());
finv.emplace_back(1 / fact.back());
}
return fact[n] * finv[k] * finv[n - k];
}
bool is_prime(long long n) {
if (n <= 1) {
return false;
}
for (long long i = 2; i * i <= n; i++) {
if (n % i == 0) {
return false;
}
}
return true;
}
int main() {
ios::sync_with_stdio(false);
cin.tie(nullptr);
input_checker in;
int k = in.readInt(1, 2e5);
in.readSpace();
mod = in.readInt(1e7 + 19, 1e9 + 7);
in.readEoln();
assert(is_prime(mod));
C(2 * k + 10, 0);
vector<mint> f(k + 1);
f[0] = 1;
for (int i = 2; i <= k; i += 2) {
f[i] = f[i - 2] * -4 / i;
}
vector<mint> g(k + 1);
g[0] = 1;
for (int i = 1; i <= k; i++) {
g[i] = g[i - 1] * (4 * i - 2) / i;
}
vector<long long> ff(k + 1), gg(k + 1);
for (int i = 0; i <= k; i++) {
ff[i] = f[i].value;
gg[i] = g[i].value;
}
auto ans = math314::int32mod_convolution(ff, gg, mod);
for (int i = 1; i <= k; i++) {
cout << ans[i] * fact[i] * fact[i] << " \n"[i == k];
}
in.readEof();
return 0;
}
Editorialist's code (C++)
#include "bits/stdc++.h"
// #pragma GCC optimize("O3,unroll-loops")
// #pragma GCC target("avx2,bmi,bmi2,lzcnt,popcnt")
using namespace std;
using ll = long long int;
mt19937_64 rng(chrono::high_resolution_clock::now().time_since_epoch().count());
template<typename T>
struct Z_p{
using Type = typename decay<decltype(T::value)>::type;
static vector<Type> MOD_INV;
constexpr Z_p(): value(){ }
template<typename U> Z_p(const U &x){ value = normalize(x); }
template<typename U> static Type normalize(const U &x){
Type v;
if(-mod() <= x && x < mod()) v = static_cast<Type>(x);
else v = static_cast<Type>(x % mod());
if(v < 0) v += mod();
return v;
}
const Type& operator()() const{ return value; }
template<typename U> explicit operator U() const{ return static_cast<U>(value); }
constexpr static Type mod(){ return T::value; }
Z_p &operator+=(const Z_p &otr){ if((value += otr.value) >= mod()) value -= mod(); return *this; }
Z_p &operator-=(const Z_p &otr){ if((value -= otr.value) < 0) value += mod(); return *this; }
template<typename U> Z_p &operator+=(const U &otr){ return *this += Z_p(otr); }
template<typename U> Z_p &operator-=(const U &otr){ return *this -= Z_p(otr); }
Z_p &operator++(){ return *this += 1; }
Z_p &operator--(){ return *this -= 1; }
Z_p operator++(int){ Z_p result(*this); *this += 1; return result; }
Z_p operator--(int){ Z_p result(*this); *this -= 1; return result; }
Z_p operator-() const{ return Z_p(-value); }
template<typename U = T>
typename enable_if<is_same<typename Z_p<U>::Type, int>::value, Z_p>::type &operator*=(const Z_p& rhs){
#ifdef _WIN32
uint64_t x = static_cast<int64_t>(value) * static_cast<int64_t>(rhs.value);
uint32_t xh = static_cast<uint32_t>(x >> 32), xl = static_cast<uint32_t>(x), d, m;
asm(
"divl %4; \n\t"
: "=a" (d), "=d" (m)
: "d" (xh), "a" (xl), "r" (mod())
);
value = m;
#else
value = normalize(static_cast<int64_t>(value) * static_cast<int64_t>(rhs.value));
#endif
return *this;
}
template<typename U = T>
typename enable_if<is_same<typename Z_p<U>::Type, int64_t>::value, Z_p>::type &operator*=(const Z_p &rhs){
uint64_t ret = static_cast<uint64_t>(value) * static_cast<uint64_t>(rhs.value) - static_cast<uint64_t>(mod()) * static_cast<uint64_t>(1.L / static_cast<uint64_t>(mod()) * static_cast<uint64_t>(value) * static_cast<uint64_t>(rhs.value));
value = normalize(static_cast<int64_t>(ret + static_cast<uint64_t>(mod()) * (ret < 0) - static_cast<uint64_t>(mod()) * (ret >= static_cast<uint64_t>(mod()))));
return *this;
}
template<typename U = T>
typename enable_if<!is_integral<typename Z_p<U>::Type>::value, Z_p>::type &operator*=(const Z_p &rhs){
value = normalize(value * rhs.value);
return *this;
}
template<typename U>
Z_p &operator^=(U e){
if(e < 0) *this = 1 / *this, e = -e;
Z_p res = 1;
for(; e; *this *= *this, e >>= 1) if(e & 1) res *= *this;
return *this = res;
}
template<typename U>
Z_p operator^(U e) const{
return Z_p(*this) ^= e;
}
Z_p &operator/=(const Z_p &otr){
Type a = otr.value, m = mod(), u = 0, v = 1;
if(a < (int)MOD_INV.size()) return *this *= MOD_INV[a];
while(a){
Type t = m / a;
m -= t * a; swap(a, m);
u -= t * v; swap(u, v);
}
assert(m == 1);
return *this *= u;
}
template<typename U> friend const Z_p<U> &abs(const Z_p<U> &v){ return v; }
Type value;
};
template<typename T> bool operator==(const Z_p<T> &lhs, const Z_p<T> &rhs){ return lhs.value == rhs.value; }
template<typename T, typename U, typename enable_if<is_integral<U>::value>::type* = nullptr> bool operator==(const Z_p<T>& lhs, U rhs){ return lhs == Z_p<T>(rhs); }
template<typename T, typename U, typename enable_if<is_integral<U>::value>::type* = nullptr> bool operator==(U lhs, const Z_p<T> &rhs){ return Z_p<T>(lhs) == rhs; }
template<typename T> bool operator!=(const Z_p<T> &lhs, const Z_p<T> &rhs){ return !(lhs == rhs); }
template<typename T, typename U, typename enable_if<is_integral<U>::value>::type* = nullptr> bool operator!=(const Z_p<T> &lhs, U rhs){ return !(lhs == rhs); }
template<typename T, typename U, typename enable_if<is_integral<U>::value>::type* = nullptr> bool operator!=(U lhs, const Z_p<T> &rhs){ return !(lhs == rhs); }
template<typename T> bool operator<(const Z_p<T> &lhs, const Z_p<T> &rhs){ return lhs.value < rhs.value; }
template<typename T> bool operator>(const Z_p<T> &lhs, const Z_p<T> &rhs){ return lhs.value > rhs.value; }
template<typename T> bool operator<=(const Z_p<T> &lhs, const Z_p<T> &rhs){ return lhs.value <= rhs.value; }
template<typename T> bool operator>=(const Z_p<T> &lhs, const Z_p<T> &rhs){ return lhs.value >= rhs.value; }
template<typename T> Z_p<T> operator+(const Z_p<T> &lhs, const Z_p<T> &rhs){ return Z_p<T>(lhs) += rhs; }
template<typename T, typename U, typename enable_if<is_integral<U>::value>::type* = nullptr> Z_p<T> operator+(const Z_p<T> &lhs, U rhs){ return Z_p<T>(lhs) += rhs; }
template<typename T, typename U, typename enable_if<is_integral<U>::value>::type* = nullptr> Z_p<T> operator+(U lhs, const Z_p<T> &rhs){ return Z_p<T>(lhs) += rhs; }
template<typename T> Z_p<T> operator-(const Z_p<T> &lhs, const Z_p<T> &rhs){ return Z_p<T>(lhs) -= rhs; }
template<typename T, typename U, typename enable_if<is_integral<U>::value>::type* = nullptr> Z_p<T> operator-(const Z_p<T>& lhs, U rhs){ return Z_p<T>(lhs) -= rhs; }
template<typename T, typename U, typename enable_if<is_integral<U>::value>::type* = nullptr> Z_p<T> operator-(U lhs, const Z_p<T> &rhs){ return Z_p<T>(lhs) -= rhs; }
template<typename T> Z_p<T> operator*(const Z_p<T> &lhs, const Z_p<T> &rhs){ return Z_p<T>(lhs) *= rhs; }
template<typename T, typename U, typename enable_if<is_integral<U>::value>::type* = nullptr> Z_p<T> operator*(const Z_p<T>& lhs, U rhs){ return Z_p<T>(lhs) *= rhs; }
template<typename T, typename U, typename enable_if<is_integral<U>::value>::type* = nullptr> Z_p<T> operator*(U lhs, const Z_p<T> &rhs){ return Z_p<T>(lhs) *= rhs; }
template<typename T> Z_p<T> operator/(const Z_p<T> &lhs, const Z_p<T> &rhs) { return Z_p<T>(lhs) /= rhs; }
template<typename T, typename U, typename enable_if<is_integral<U>::value>::type* = nullptr> Z_p<T> operator/(const Z_p<T>& lhs, U rhs) { return Z_p<T>(lhs) /= rhs; }
template<typename T, typename U, typename enable_if<is_integral<U>::value>::type* = nullptr> Z_p<T> operator/(U lhs, const Z_p<T> &rhs) { return Z_p<T>(lhs) /= rhs; }
template<typename T> istream &operator>>(istream &in, Z_p<T> &number){
typename common_type<typename Z_p<T>::Type, int64_t>::type x;
in >> x;
number.value = Z_p<T>::normalize(x);
return in;
}
template<typename T> ostream &operator<<(ostream &out, const Z_p<T> &number){ return out << number(); }
using ModType = int;
struct VarMod{ static ModType value; };
ModType VarMod::value;
ModType &mod = VarMod::value;
using Zp = Z_p<VarMod>;
// constexpr int mod = 1e9 + 7; // 1000000007
// constexpr int mod = (119 << 23) + 1; // 998244353
// constexpr int mod = 1e9 + 9; // 1000000009
// using Zp = Z_p<integral_constant<decay<decltype(mod)>::type, mod>>;
template<typename T> vector<typename Z_p<T>::Type> Z_p<T>::MOD_INV;
template<typename T = integral_constant<decay<decltype(mod)>::type, mod>>
void precalc_inverse(int SZ){
auto &inv = Z_p<T>::MOD_INV;
if(inv.empty()) inv.assign(2, 1);
for(; inv.size() <= SZ; ) inv.push_back((mod - 1LL * mod / (int)inv.size() * inv[mod % (int)inv.size()]) % mod);
}
template<typename T>
vector<T> precalc_power(T base, int SZ){
vector<T> res(SZ + 1, 1);
for(auto i = 1; i <= SZ; ++ i) res[i] = res[i - 1] * base;
return res;
}
template<typename T>
vector<T> precalc_factorial(int SZ){
vector<T> res(SZ + 1, 1); res[0] = 1;
for(auto i = 1; i <= SZ; ++ i) res[i] = res[i - 1] * i;
return res;
}
using C = complex<double>;
using vd = vector<double>;
const long double PI = 2*acos(0.0);
void fft(vector<C>& a) {
int n = size(a), L = 31 - __builtin_clz(n);
static vector<complex<long double>> R(2, 1);
static vector<C> rt(2, 1); // (^ 10% faster if double)
for (static int k = 2; k < n; k *= 2) {
R.resize(n); rt.resize(n);
const long double PI = 2*acos(0.0);
auto x = polar(1.0L, PI / k); // M_PI, lower-case L
for (int i = k; i < 2*k; ++i) rt[i] = R[i] = i&1 ? R[i/2] * x : R[i/2];
}
vector<int> rev(n);
for (int i = 0; i < n; ++i) rev[i] = (rev[i / 2] | (i & 1) << L) / 2;
for (int i = 0; i < n; ++i) if (i < rev[i]) swap(a[i], a[rev[i]]);
for (int k = 1; k < n; k *= 2)
for (int i = 0; i < n; i += 2 * k) for (int j = 0; j < k; ++j) {
C z = rt[j+k] * a[i+j+k]; // (25% faster if hand-rolled) /// include-line
a[i + j + k] = a[i + j] - z;
a[i + j] += z;
}
}
vd conv(const vd& a, const vd& b) {
if (a.empty() || b.empty()) return {};
vd res(size(a) + size(b) - 1);
int L = 32 - __builtin_clz(size(res)), n = 1 << L;
vector<C> in(n), out(n);
copy(begin(a), end(a), begin(in));
for (int i = 0; i < (int)size(b); ++i) in[i].imag(b[i]);
fft(in);
for (auto &x : in) x *= x;
for (int i = 0; i < n; ++i) out[i] = in[-i & (n - 1)] - conj(in[i]);
fft(out);
for (int i = 0; i < (int)size(res); ++i) res[i] = imag(out[i]) / (4 * n);
return res;
}
// FFT-MOD
typedef vector<ll> vl;
vl convMod(const vl &a, const vl &b, int M) {
if (a.empty() || b.empty()) return {};
vl res(size(a) + size(b) - 1);
int B=32-__builtin_clz(size(res)), n=1<<B, cut=int(sqrt(M));
vector<C> L(n), R(n), outs(n), outl(n);
for (int i = 0; i < (int)size(a); ++i) L[i] = C((int)a[i] / cut, (int)a[i] % cut);
for (int i = 0; i < (int)size(b); ++i) R[i] = C((int)b[i] / cut, (int)b[i] % cut);
fft(L), fft(R);
for (int i = 0; i < n; ++i) {
int j = -i & (n - 1);
outl[j] = (L[i] + conj(L[j])) * R[i] / (2.0 * n);
outs[j] = (L[i] - conj(L[j])) * R[i] / (2.0 * n) / 1i;
}
fft(outl), fft(outs);
for (int i = 0; i < (int)size(res); ++i) {
ll av = ll(real(outl[i])+.5), cv = ll(imag(outs[i])+.5);
ll bv = ll(imag(outl[i])+.5) + ll(real(outs[i])+.5);
res[i] = ((av % M * cut + bv) % M * cut + cv) % M;
}
return res;
}
int main()
{
ios::sync_with_stdio(false); cin.tie(0);
int k; cin >> k >> mod;
auto fac = precalc_factorial<Zp>(2*k+10);
auto C = [&] (int n, int r) {
if (n < r or r < 0) return Zp(0);
return fac[n] / (fac[r] * fac[n-r]);
};
vector<ll> poly1(k/2 + 1), poly2(k/2 + 1), poly3(k/2 + 1);
for (int i = 0; i <= k/2; ++i) {
Zp cur = (Zp(-2) ^ i) / fac[i];
poly1[i] = cur.value;
cur = C(4*i, 2*i);
poly2[i] = cur.value;
cur = C(4*i+2, 2*i+1);
poly3[i] = cur.value;
}
auto res1 = convMod(poly1, poly2, mod);
auto res2 = convMod(poly1, poly3, mod);
cout << res2[0] << ' ';
for (int i = 1; i < (k+1)/2; ++i) {
cout << fac[2*i] * fac[2*i] * res1[i] << ' ';
cout << fac[2*i+1] * fac[2*i+1] * res2[i] << ' ';
}
if (k%2 == 0) cout << fac[k] * fac[k] * res1[k/2] << '\n';
}