# SUSPERMS - Editorial

Author: tibinyte
Testers: tabr, iceknight1093
Editorialist: iceknight1093

TBD

# PROBLEM:

A permutation P of length 2N is called N-beautiful if the N pairs (P_i\bmod N, P_{i+N}\bmod N) are all distinct.

Given K, For each 1 \leq i \leq K find the number of i-beautiful permutations, modulo a given prime number M.

# EXPLANATION:

Weāll first solve subtask 1, and then extend that solution to subtask 2.

The limit on K is small enough here that \mathcal{O}(K^2) will pass.
In particular, we can solve for a fixed N in \mathcal{O}(N) time; so letās attempt to do that.

Note that we only really care about the values of P_i \bmod N, which range from 0 to N-1. Also, we have exactly two of each value.
Weāll call arrays of length 2N containing each integer from 0 to N-1 twice good arrays.

Letās call a permutation that isnāt N-beautiful, N-ugly.
Weāll try to count the number of N-ugly permutations.

Suppose we have a N-ugly permutation.
Then, there exist two indices 1 \leq i \lt j \leq N such that P_i = P_j and P_{i+N} = P_{j+N}.
Such a pair will be called a bad pair.

In particular, note that these four positions use all copies of P_i and P_{i+N}, so the rest of the elements are simply a good array of length 2N-4 (after appropriate relabelling).

Itās not too hard to count the number of such arrays, since fixing i, j, the values P_i and P_{i+N}, and which copy of each value goes to which position is simple math and a bunch of multiplications.

However, we canāt directly subtract this count from (2N)!, since a good array with multiple bad pairs was counted multiple times!
This should lead you to the idea of using the inclusion-exclusion principle.

Let B_i be the number of good arrays with at least i bad pairs.
By the principle of inclusion-exclusion, the answer weāre looking for is

\sum_{i=0}^{N} (-1)^N B_i

So, letās focus on computing B_i quickly.
This is a generalization of how we computed B_1, and is just a bunch of combinatorics.

Details

There are i bad pairs, which fixes 2i positions among the first N. These can be chosen in \binom{N}{2i} ways.
We need to now split these 2i positions into i pairs.
This can be done in \binom{2i}{2, 2, 2, \ldots, 2} = \frac{(2i)!}{2^i} ways by definition of the multinomial coefficient.

Next, we need to fix which elements go into each pair. Letās fix an order of the pairs and assign values one by one.
This can be done in N\cdot (N-1)\cdot (N-2) \cdot \ldots \cdot (N-2i+1) ways, which after a little algebraic manipulation is seen to be just \binom{N}{2i} (2i)!.

However, we also need to divide by i! since the order in which we assign values to the pairs doesnāt actually matter (it makes no difference to the final permutation).

Each bad pair now has four choices (we can swap elements with the same value modulo N to obtain a different permutation with the same bad pair count).
This gives us 4^i choices in total.

Finally, there are (2N - 4i)! ways to arrange the elements not among these bad pairs.

Putting everything together, we have

B_i = \binom{N}{2i} \binom{N}{2i} \frac{4^i(2i)! (2i)! (2N - 4i)!}{2^ii!}

which can be computed in \mathcal{O}(\log N) or even \mathcal{O}(1) time, since everything is either a factorial or a power of two (or an inverse of one of them)

Since each B_i can be computed quickly, we can solve for a fixed N in \mathcal{O}(N) time, which is good enough for subtask 1.

Letās look at our expression for B_i again, and clean it up a bit by cancelling out common terms.
We obtain

B_i = \frac{2^i N! N! (2N - 4i)!}{(N - 2i)! (N - 2i)! i!} = \frac{2^i N! N!}{i!} \binom{2N - 4i}{N - 2i}

The N! terms are independent of i, so letās ignore them for now.
Notice that we have the product of an āi-partā comprising \frac{2^i}{i!} and something like an "N-i part" comprising \binom{2N - 4i}{N - 2i}.

This should make you think along the lines of convolution.
However, we canāt directly multiply two polynomials: the N-i part doesnāt quite depend on N-i, it depends on N - 2i (and thereās no apparent nice way to make the i-part depend on 2i instead).
Itād be nice if we had 2N instead of N, so letās do just that!

We split the computation into two cases: even N and odd N. Letās deal with the even case first.
So, suppose N = 2n.

We now have \displaystyle\binom{2N - 4i}{N - 2i} = \binom{4n - 4i}{2n - 2i} = \binom{4(n - i)}{2(n-i)}, which does depend on n - i.

So, create the polynomials p and q, with coefficients

• p_i =\frac{ (-1)^i 2^i}{i!}
• q_i = \binom{4i}{2i}

and multiply them together so obtain r.

The answer for 2n is then r_n, multiplied by (2n)!^2 (which we took out at the start).

For odd N, a similar technique works: write N = 2n + 1, and once again youāll get a very similar relation on n-i. Following this, multiply the appropriate polynomials and weāre done.

Note that this problem requires multiplying two polynomials and keeping their coefficients modulo some M, where M isnāt necessarily āniceā.
Standard FFT and NTT wonāt suffice here; a tutorial on how to do this can be found at cp-algorithms.

# TIME COMPLEXITY

\mathcal{O}(K\log K) per testcase.

# CODE:

Setter's code (C++)
#include <bits/stdc++.h>
using namespace std;

int n, mod;
struct modint
{
int32_t value;
modint() = default;
modint(int32_t value_) : value(value_ % mod) {}
modint(int64_t value_) : value(value_ % mod) {}
inline modint operator+(modint other) const
{
int32_t c = this->value + other.value;
return modint(c >= mod ? c - mod : c);
}
inline modint operator-(modint other) const
{
int32_t c = this->value - other.value;
return modint(c < 0 ? c + mod : c);
}
inline modint operator*(modint other) const
{
int32_t c = (int64_t)this->value * other.value % mod;
return modint(c < 0 ? c + mod : c);
}
inline modint &operator+=(modint other)
{
this->value += other.value;
if (this->value >= mod)
this->value -= mod;
return *this;
}
inline modint &operator-=(modint other)
{
this->value -= other.value;
if (this->value < 0)
this->value += mod;
return *this;
}
inline modint &operator*=(modint other)
{
this->value = (int64_t)this->value * other.value % mod;
if (this->value < 0)
this->value += mod;
return *this;
}
inline modint operator-() const { return modint(this->value ? mod - this->value : 0); }
modint pow(int32_t k) const
{
modint x = *this, y = 1;
for (; k; k >>= 1)
{
if (k & 1)
y *= x;
x *= x;
}
return y;
}
modint inv() const { return pow(mod - 2); } // MOD must be a prime
inline modint operator/(modint other) const { return *this * other.inv(); }
inline modint operator/=(modint other) { return *this *= other.inv(); }
inline bool operator==(modint other) const { return value == other.value; }
inline bool operator!=(modint other) const { return value != other.value; }
inline bool operator<(modint other) const { return value < other.value; }
inline bool operator>(modint other) const { return value > other.value; }
};
modint operator*(int64_t value, modint n) { return modint(value) * n; }
modint operator*(int32_t value, modint n) { return modint(value) * n; }
istream &operator>>(istream &in, modint &n) { return in >> n.value; }
ostream &operator<<(ostream &out, modint n) { return out << n.value; }
struct combi
{
int n;
vector<modint> facts, finvs, invs;
combi(int _n) : n(_n), facts(_n), finvs(_n), invs(_n)
{
facts[0] = finvs[0] = 1;
invs[1] = 1;
for (int i = 2; i < n; i++)
invs[i] = invs[mod % i] * (-mod / i);
for (int i = 1; i < n; i++)
{
facts[i] = facts[i - 1] * i;
finvs[i] = finvs[i - 1] * invs[i];
}
}
inline modint fact(int n) { return facts[n]; }
inline modint finv(int n) { return finvs[n]; }
inline modint inv(int n) { return invs[n]; }
inline modint ncr(int n, int k) { return n < k or k < 0 ? 0 : facts[n] * finvs[k] * finvs[n - k]; }
inline modint aranj(int n, int k) { return ncr(n, k) * facts[k]; }
};
struct base
{
double x, y;
base() { x = y = 0; }
base(double x, double y) : x(x), y(y) {}
};
inline base operator+(base a, base b) { return base(a.x + b.x, a.y + b.y); }
inline base operator-(base a, base b) { return base(a.x - b.x, a.y - b.y); }
inline base operator*(base a, base b) { return base(a.x * b.x - a.y * b.y, a.x * b.y + a.y * b.x); }
inline base conj(base a) { return base(a.x, -a.y); }
int lim = 1;
vector<base> roots = {{0, 0}, {1, 0}};
vector<int> rev = {0, 1};
const double PI = acosl(-1.0);
void ensure_base(int p)
{
if (p <= lim)
return;
rev.resize(1 << p);
for (int i = 0; i < (1 << p); i++)
rev[i] = (rev[i >> 1] >> 1) + ((i & 1) << (p - 1));
roots.resize(1 << p);
while (lim < p)
{
double angle = 2 * PI / (1 << (lim + 1));
for (int i = 1 << (lim - 1); i < (1 << lim); i++)
{
roots[i << 1] = roots[i];
double angle_i = angle * (2 * i + 1 - (1 << lim));
roots[(i << 1) + 1] = base(cos(angle_i), sin(angle_i));
}
lim++;
}
}
void fft(vector<base> &a, int n = -1)
{
if (n == -1)
n = a.size();
assert((n & (n - 1)) == 0);
int zeros = __builtin_ctz(n);
ensure_base(zeros);
int shift = lim - zeros;
for (int i = 0; i < n; i++)
if (i < (rev[i] >> shift))
swap(a[i], a[rev[i] >> shift]);
for (int k = 1; k < n; k <<= 1)
{
for (int i = 0; i < n; i += 2 * k)
{
for (int j = 0; j < k; j++)
{
base z = a[i + j + k] * roots[j + k];
a[i + j + k] = a[i + j] - z;
a[i + j] = a[i + j] + z;
}
}
}
}
vector<int> multiply(vector<int> &a, vector<int> &b, int eq = 0)
{
int need = a.size() + b.size() - 1;
int p = 0;
while ((1 << p) < need)
p++;
ensure_base(p);
int sz = 1 << p;
vector<base> A, B;
if (sz > (int)A.size())
A.resize(sz);
for (int i = 0; i < (int)a.size(); i++)
{
int x = (a[i] % mod + mod) % mod;
A[i] = base(x & ((1 << 15) - 1), x >> 15);
}
fill(A.begin() + a.size(), A.begin() + sz, base{0, 0});
fft(A, sz);
if (sz > (int)B.size())
B.resize(sz);
if (eq)
copy(A.begin(), A.begin() + sz, B.begin());
else
{
for (int i = 0; i < (int)b.size(); i++)
{
int x = (b[i] % mod + mod) % mod;
B[i] = base(x & ((1 << 15) - 1), x >> 15);
}
fill(B.begin() + b.size(), B.begin() + sz, base{0, 0});
fft(B, sz);
}
double ratio = 0.25 / sz;
base r2(0, -1), r3(ratio, 0), r4(0, -ratio), r5(0, 1);
for (int i = 0; i <= (sz >> 1); i++)
{
int j = (sz - i) & (sz - 1);
base a1 = (A[i] + conj(A[j])), a2 = (A[i] - conj(A[j])) * r2;
base b1 = (B[i] + conj(B[j])) * r3, b2 = (B[i] - conj(B[j])) * r4;
if (i != j)
{
base c1 = (A[j] + conj(A[i])), c2 = (A[j] - conj(A[i])) * r2;
base d1 = (B[j] + conj(B[i])) * r3, d2 = (B[j] - conj(B[i])) * r4;
A[i] = c1 * d1 + c2 * d2 * r5;
B[i] = c1 * d2 + c2 * d1;
}
A[j] = a1 * b1 + a2 * b2 * r5;
B[j] = a1 * b2 + a2 * b1;
}
fft(A, sz);
fft(B, sz);
vector<int> res(need);
for (int i = 0; i < need; i++)
{
long long aa = A[i].x + 0.5;
long long bb = B[i].x + 0.5;
long long cc = A[i].y + 0.5;
res[i] = (aa + ((bb % mod) << 15) + ((cc % mod) << 30)) % mod;
}
return res;
}
int main()
{
cin.tie(nullptr)->sync_with_stdio(false);
cin >> n >> mod;
combi C(n + n + 1);
int lim = n / 2;
vector<int> first(lim + 1);
for (int i = 0; i <= lim; ++i)
{
modint rep = C.fact(4 * i) * C.finv(2 * i) * C.finv(2 * i);
first[i] = rep.value;
}
vector<int> second(lim + 1);
for (int i = 0; i <= lim; ++i)
{
modint rep = modint(2).pow(i) * modint(-1).pow(i) * C.finv(i);
second[i] = rep.value;
}
vector<int> sumeven = multiply(first, second);
first = vector<int>(lim + 2);
for (int i = 1; i <= lim + 1; ++i)
{
modint rep = C.fact(4 * i - 2) * C.finv(2 * i - 1) * C.finv(2 * i - 1);
first[i] = rep.value;
}
vector<int> sumodd = multiply(first, second);
for (int i = 1; i <= n; ++i)
{
if (i % 2 == 1)
{
modint rep = C.fact(i) * C.fact(i) * sumodd[(i + 1) / 2];
cout << rep << ' ';
}
else
{
modint rep = C.fact(i) * C.fact(i) * sumeven[i / 2];
cout << rep << ' ';
}
}
}

Tester's code (C++)
#include <bits/stdc++.h>
using namespace std;
#ifdef tabr
#include "library/debug.cpp"
#else
#define debug(...)
#endif

struct input_checker {
string buffer;
int pos;

const string all = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
const string number = "0123456789";
const string upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
const string lower = "abcdefghijklmnopqrstuvwxyz";

input_checker() {
pos = 0;
while (true) {
int c = cin.get();
if (c == -1) {
break;
}
buffer.push_back((char) c);
}
}

assert(pos < (int) buffer.size());
string res;
while (pos < (int) buffer.size() && buffer[pos] != ' ' && buffer[pos] != '\n') {
res += buffer[pos];
pos++;
}
return res;
}

string readString(int min_len, int max_len, const string& pattern = "") {
assert(min_len <= max_len);
assert(min_len <= (int) res.size());
assert((int) res.size() <= max_len);
for (int i = 0; i < (int) res.size(); i++) {
assert(pattern.empty() || pattern.find(res[i]) != string::npos);
}
return res;
}

int readInt(int min_val, int max_val) {
assert(min_val <= max_val);
assert(min_val <= res);
assert(res <= max_val);
return res;
}

long long readLong(long long min_val, long long max_val) {
assert(min_val <= max_val);
assert(min_val <= res);
assert(res <= max_val);
return res;
}

vector<int> readInts(int size, int min_val, int max_val) {
assert(min_val <= max_val);
vector<int> res(size);
for (int i = 0; i < size; i++) {
if (i != size - 1) {
}
}
return res;
}

vector<long long> readLongs(int size, long long min_val, long long max_val) {
assert(min_val <= max_val);
vector<long long> res(size);
for (int i = 0; i < size; i++) {
if (i != size - 1) {
}
}
return res;
}

assert((int) buffer.size() > pos);
assert(buffer[pos] == ' ');
pos++;
}

assert((int) buffer.size() > pos);
assert(buffer[pos] == '\n');
pos++;
}

assert((int) buffer.size() == pos);
}
};

namespace math314 {
typedef long long ll;
typedef pair<int, int> Pii;

#define FOR(i, n) for (int i = 0; i < (n); i++)
#define sz(c) ((int) (c).size())
#define ten(x) ((int) 1e##x)

template <class T>
T extgcd(T a, T b, T& x, T& y) {
for (T u = y = 1, v = x = 0; a;) {
T q = b / a;
swap(x -= q * u, u);
swap(y -= q * v, v);
swap(b -= q * a, a);
}
return b;
}
template <class T>
T mod_inv(T a, T m) {
T x, y;
extgcd(a, m, x, y);
return (m + x % m) % m;
}
ll mod_pow(ll a, ll n, ll mod) {
ll ret = 1;
ll p = a % mod;
while (n) {
if (n & 1) ret = ret * p % mod;
p = p * p % mod;
n >>= 1;
}
return ret;
}

template <int mod, int primitive_root>
class NTT {
public:
int get_mod() const { return mod; }
void _ntt(vector<ll>& a, int sign) {
const int n = sz(a);
assert((n ^ (n & -n)) == 0);  // n = 2^k

const int g = 3;                               // g is primitive root of mod
int h = (int) mod_pow(g, (mod - 1) / n, mod);  // h^n = 1
if (sign == -1) h = (int) mod_inv(h, mod);     // h = h^-1 % mod

// bit reverse
int i = 0;
for (int j = 1; j < n - 1; ++j) {
for (int k = n >> 1; k > (i ^= k); k >>= 1)
;
if (j < i) swap(a[i], a[j]);
}

for (int m = 1; m < n; m *= 2) {
const int m2 = 2 * m;
const ll base = mod_pow(h, n / m2, mod);
ll w = 1;
FOR(x, m) {
for (int s = x; s < n; s += m2) {
ll u = a[s];
ll d = a[s + m] * w % mod;
a[s] = u + d;
if (a[s] >= mod) a[s] -= mod;
a[s + m] = u - d;
if (a[s + m] < 0) a[s + m] += mod;
}
w = w * base % mod;
}
}

for (auto& x : a)
if (x < 0) x += mod;
}
void ntt(vector<ll>& input) {
_ntt(input, 1);
}
void intt(vector<ll>& input) {
_ntt(input, -1);
const int n_inv = mod_inv(sz(input), mod);
for (auto& x : input) x = x * n_inv % mod;
}

// ē³ćæč¾¼ćæę¼ē®ćč”ć
vector<ll> convolution(const vector<ll>& a, const vector<ll>& b) {
int ntt_size = 1;
while (ntt_size < sz(a) + sz(b)) ntt_size *= 2;

vector<ll> _a = a, _b = b;
_a.resize(ntt_size);
_b.resize(ntt_size);

ntt(_a);
ntt(_b);

FOR(i, ntt_size) {
(_a[i] *= _b[i]) %= mod;
}

intt(_a);
return _a;
}
};

ll garner(vector<Pii> mr, int mod) {
mr.emplace_back(mod, 0);

vector<ll> coffs(sz(mr), 1);
vector<ll> constants(sz(mr), 0);
FOR(i, sz(mr) - 1) {
// coffs[i] * v + constants[i] == mr[i].second (mod mr[i].first) ćč§£ć
ll v = (mr[i].second - constants[i]) * mod_inv<ll>(coffs[i], mr[i].first) % mr[i].first;
if (v < 0) v += mr[i].first;

for (int j = i + 1; j < sz(mr); j++) {
(constants[j] += coffs[j] * v) %= mr[j].first;
(coffs[j] *= mr[i].first) %= mr[j].first;
}
}

return constants[sz(mr) - 1];
}

typedef NTT<167772161, 3> NTT_1;
typedef NTT<469762049, 3> NTT_2;
typedef NTT<1224736769, 3> NTT_3;

vector<ll> int32mod_convolution(vector<ll> a, vector<ll> b, int mod) {
for (auto& x : a) x %= mod;
for (auto& x : b) x %= mod;
NTT_1 ntt1;
NTT_2 ntt2;
NTT_3 ntt3;
auto x = ntt1.convolution(a, b);
auto y = ntt2.convolution(a, b);
auto z = ntt3.convolution(a, b);

vector<ll> ret(sz(x));
vector<Pii> mr(3);
FOR(i, sz(x)) {
mr[0].first = ntt1.get_mod(), mr[0].second = (int) x[i];
mr[1].first = ntt2.get_mod(), mr[1].second = (int) y[i];
mr[2].first = ntt3.get_mod(), mr[2].second = (int) z[i];
ret[i] = garner(mr, mod);
}

return ret;
}
}  // namespace math314

long long mod = 2;

struct mint {
long long value;
mint(long long x = 0) {
value = x % mod;
if (value < 0) value += mod;
}
mint& operator+=(const mint& other) {
if ((value += other.value) >= mod) value -= mod;
return *this;
}
mint& operator-=(const mint& other) {
if ((value -= other.value) < 0) value += mod;
return *this;
}
mint& operator*=(const mint& other) {
value = value * other.value % mod;
return *this;
}
mint& operator/=(const mint& other) {
long long a = 0, b = 1, c = other.value, m = mod;
while (c != 0) {
long long t = m / c;
m -= t * c;
swap(c, m);
a -= t * b;
swap(a, b);
}
a %= mod;
if (a < 0) a += mod;
value = value * a % mod;
return *this;
}
friend mint operator+(const mint& lhs, const mint& rhs) { return mint(lhs) += rhs; }
friend mint operator-(const mint& lhs, const mint& rhs) { return mint(lhs) -= rhs; }
friend mint operator*(const mint& lhs, const mint& rhs) { return mint(lhs) *= rhs; }
friend mint operator/(const mint& lhs, const mint& rhs) { return mint(lhs) /= rhs; }
mint& operator++() { return *this += 1; }
mint& operator--() { return *this -= 1; }
mint operator++(int) {
mint result(*this);
*this += 1;
return result;
}
mint operator--(int) {
mint result(*this);
*this -= 1;
return result;
}
mint operator-() const { return mint(-value); }
bool operator==(const mint& rhs) const { return value == rhs.value; }
bool operator!=(const mint& rhs) const { return value != rhs.value; }
bool operator<(const mint& rhs) const { return value < rhs.value; }
};
string to_string(const mint& x) {
}
ostream& operator<<(ostream& stream, const mint& x) {
return stream << x.value;
}
istream& operator>>(istream& stream, mint& x) {
stream >> x.value;
x.value %= mod;
if (x.value < 0) x.value += mod;
return stream;
}

mint power(mint a, long long n) {
mint res = 1;
while (n > 0) {
if (n & 1) {
res *= a;
}
a *= a;
n >>= 1;
}
return res;
}

vector<mint> fact(1, 1);
vector<mint> finv(1, 1);

mint C(int n, int k) {
if (n < k || k < 0) {
return mint(0);
}
while ((int) fact.size() < n + 1) {
fact.emplace_back(fact.back() * (int) fact.size());
finv.emplace_back(1 / fact.back());
}
return fact[n] * finv[k] * finv[n - k];
}

bool is_prime(long long n) {
if (n <= 1) {
return false;
}
for (long long i = 2; i * i <= n; i++) {
if (n % i == 0) {
return false;
}
}
return true;
}

int main() {
ios::sync_with_stdio(false);
cin.tie(nullptr);
input_checker in;
mod = in.readInt(1e7 + 19, 1e9 + 7);
assert(is_prime(mod));
C(2 * k + 10, 0);
vector<mint> f(k + 1);
f[0] = 1;
for (int i = 2; i <= k; i += 2) {
f[i] = f[i - 2] * -4 / i;
}
vector<mint> g(k + 1);
g[0] = 1;
for (int i = 1; i <= k; i++) {
g[i] = g[i - 1] * (4 * i - 2) / i;
}
vector<long long> ff(k + 1), gg(k + 1);
for (int i = 0; i <= k; i++) {
ff[i] = f[i].value;
gg[i] = g[i].value;
}
auto ans = math314::int32mod_convolution(ff, gg, mod);
for (int i = 1; i <= k; i++) {
cout << ans[i] * fact[i] * fact[i] << " \n"[i == k];
}
return 0;
}

Editorialist's code (C++)
#include "bits/stdc++.h"
// #pragma GCC optimize("O3,unroll-loops")
// #pragma GCC target("avx2,bmi,bmi2,lzcnt,popcnt")
using namespace std;
using ll = long long int;
mt19937_64 rng(chrono::high_resolution_clock::now().time_since_epoch().count());

template<typename T>
struct Z_p{
using Type = typename decay<decltype(T::value)>::type;
static vector<Type> MOD_INV;
constexpr Z_p(): value(){ }
template<typename U> Z_p(const U &x){ value = normalize(x); }
template<typename U> static Type normalize(const U &x){
Type v;
if(-mod() <= x && x < mod()) v = static_cast<Type>(x);
else v = static_cast<Type>(x % mod());
if(v < 0) v += mod();
return v;
}
const Type& operator()() const{ return value; }
template<typename U> explicit operator U() const{ return static_cast<U>(value); }
constexpr static Type mod(){ return T::value; }
Z_p &operator+=(const Z_p &otr){ if((value += otr.value) >= mod()) value -= mod(); return *this; }
Z_p &operator-=(const Z_p &otr){ if((value -= otr.value) < 0) value += mod(); return *this; }
template<typename U> Z_p &operator+=(const U &otr){ return *this += Z_p(otr); }
template<typename U> Z_p &operator-=(const U &otr){ return *this -= Z_p(otr); }
Z_p &operator++(){ return *this += 1; }
Z_p &operator--(){ return *this -= 1; }
Z_p operator++(int){ Z_p result(*this); *this += 1; return result; }
Z_p operator--(int){ Z_p result(*this); *this -= 1; return result; }
Z_p operator-() const{ return Z_p(-value); }
template<typename U = T>
typename enable_if<is_same<typename Z_p<U>::Type, int>::value, Z_p>::type &operator*=(const Z_p& rhs){
#ifdef _WIN32
uint64_t x = static_cast<int64_t>(value) * static_cast<int64_t>(rhs.value);
uint32_t xh = static_cast<uint32_t>(x >> 32), xl = static_cast<uint32_t>(x), d, m;
asm(
"divl %4; \n\t"
: "=a" (d), "=d" (m)
: "d" (xh), "a" (xl), "r" (mod())
);
value = m;
#else
value = normalize(static_cast<int64_t>(value) * static_cast<int64_t>(rhs.value));
#endif
return *this;
}
template<typename U = T>
typename enable_if<is_same<typename Z_p<U>::Type, int64_t>::value, Z_p>::type &operator*=(const Z_p &rhs){
uint64_t ret = static_cast<uint64_t>(value) * static_cast<uint64_t>(rhs.value) - static_cast<uint64_t>(mod()) * static_cast<uint64_t>(1.L / static_cast<uint64_t>(mod()) * static_cast<uint64_t>(value) * static_cast<uint64_t>(rhs.value));
value = normalize(static_cast<int64_t>(ret + static_cast<uint64_t>(mod()) * (ret < 0) - static_cast<uint64_t>(mod()) * (ret >= static_cast<uint64_t>(mod()))));
return *this;
}
template<typename U = T>
typename enable_if<!is_integral<typename Z_p<U>::Type>::value, Z_p>::type &operator*=(const Z_p &rhs){
value = normalize(value * rhs.value);
return *this;
}
template<typename U>
Z_p &operator^=(U e){
if(e < 0) *this = 1 / *this, e = -e;
Z_p res = 1;
for(; e; *this *= *this, e >>= 1) if(e & 1) res *= *this;
return *this = res;
}
template<typename U>
Z_p operator^(U e) const{
return Z_p(*this) ^= e;
}
Z_p &operator/=(const Z_p &otr){
Type a = otr.value, m = mod(), u = 0, v = 1;
if(a < (int)MOD_INV.size()) return *this *= MOD_INV[a];
while(a){
Type t = m / a;
m -= t * a; swap(a, m);
u -= t * v; swap(u, v);
}
assert(m == 1);
return *this *= u;
}
template<typename U> friend const Z_p<U> &abs(const Z_p<U> &v){ return v; }
Type value;
};
template<typename T> bool operator==(const Z_p<T> &lhs, const Z_p<T> &rhs){ return lhs.value == rhs.value; }
template<typename T, typename U, typename enable_if<is_integral<U>::value>::type* = nullptr> bool operator==(const Z_p<T>& lhs, U rhs){ return lhs == Z_p<T>(rhs); }
template<typename T, typename U, typename enable_if<is_integral<U>::value>::type* = nullptr> bool operator==(U lhs, const Z_p<T> &rhs){ return Z_p<T>(lhs) == rhs; }
template<typename T> bool operator!=(const Z_p<T> &lhs, const Z_p<T> &rhs){ return !(lhs == rhs); }
template<typename T, typename U, typename enable_if<is_integral<U>::value>::type* = nullptr> bool operator!=(const Z_p<T> &lhs, U rhs){ return !(lhs == rhs); }
template<typename T, typename U, typename enable_if<is_integral<U>::value>::type* = nullptr> bool operator!=(U lhs, const Z_p<T> &rhs){ return !(lhs == rhs); }
template<typename T> bool operator<(const Z_p<T> &lhs, const Z_p<T> &rhs){ return lhs.value < rhs.value; }
template<typename T> bool operator>(const Z_p<T> &lhs, const Z_p<T> &rhs){ return lhs.value > rhs.value; }
template<typename T> bool operator<=(const Z_p<T> &lhs, const Z_p<T> &rhs){ return lhs.value <= rhs.value; }
template<typename T> bool operator>=(const Z_p<T> &lhs, const Z_p<T> &rhs){ return lhs.value >= rhs.value; }
template<typename T> Z_p<T> operator+(const Z_p<T> &lhs, const Z_p<T> &rhs){ return Z_p<T>(lhs) += rhs; }
template<typename T, typename U, typename enable_if<is_integral<U>::value>::type* = nullptr> Z_p<T> operator+(const Z_p<T> &lhs, U rhs){ return Z_p<T>(lhs) += rhs; }
template<typename T, typename U, typename enable_if<is_integral<U>::value>::type* = nullptr> Z_p<T> operator+(U lhs, const Z_p<T> &rhs){ return Z_p<T>(lhs) += rhs; }
template<typename T> Z_p<T> operator-(const Z_p<T> &lhs, const Z_p<T> &rhs){ return Z_p<T>(lhs) -= rhs; }
template<typename T, typename U, typename enable_if<is_integral<U>::value>::type* = nullptr> Z_p<T> operator-(const Z_p<T>& lhs, U rhs){ return Z_p<T>(lhs) -= rhs; }
template<typename T, typename U, typename enable_if<is_integral<U>::value>::type* = nullptr> Z_p<T> operator-(U lhs, const Z_p<T> &rhs){ return Z_p<T>(lhs) -= rhs; }
template<typename T> Z_p<T> operator*(const Z_p<T> &lhs, const Z_p<T> &rhs){ return Z_p<T>(lhs) *= rhs; }
template<typename T, typename U, typename enable_if<is_integral<U>::value>::type* = nullptr> Z_p<T> operator*(const Z_p<T>& lhs, U rhs){ return Z_p<T>(lhs) *= rhs; }
template<typename T, typename U, typename enable_if<is_integral<U>::value>::type* = nullptr> Z_p<T> operator*(U lhs, const Z_p<T> &rhs){ return Z_p<T>(lhs) *= rhs; }
template<typename T> Z_p<T> operator/(const Z_p<T> &lhs, const Z_p<T> &rhs) { return Z_p<T>(lhs) /= rhs; }
template<typename T, typename U, typename enable_if<is_integral<U>::value>::type* = nullptr> Z_p<T> operator/(const Z_p<T>& lhs, U rhs) { return Z_p<T>(lhs) /= rhs; }
template<typename T, typename U, typename enable_if<is_integral<U>::value>::type* = nullptr> Z_p<T> operator/(U lhs, const Z_p<T> &rhs) { return Z_p<T>(lhs) /= rhs; }
template<typename T> istream &operator>>(istream &in, Z_p<T> &number){
typename common_type<typename Z_p<T>::Type, int64_t>::type x;
in >> x;
number.value = Z_p<T>::normalize(x);
return in;
}
template<typename T> ostream &operator<<(ostream &out, const Z_p<T> &number){ return out << number(); }

using ModType = int;
struct VarMod{ static ModType value; };
ModType VarMod::value;
ModType &mod = VarMod::value;
using Zp = Z_p<VarMod>;

// constexpr int mod = 1e9 + 7; // 1000000007
// constexpr int mod = (119 << 23) + 1; // 998244353
// constexpr int mod = 1e9 + 9; // 1000000009
// using Zp = Z_p<integral_constant<decay<decltype(mod)>::type, mod>>;

template<typename T> vector<typename Z_p<T>::Type> Z_p<T>::MOD_INV;
template<typename T = integral_constant<decay<decltype(mod)>::type, mod>>
void precalc_inverse(int SZ){
auto &inv = Z_p<T>::MOD_INV;
if(inv.empty()) inv.assign(2, 1);
for(; inv.size() <= SZ; ) inv.push_back((mod - 1LL * mod / (int)inv.size() * inv[mod % (int)inv.size()]) % mod);
}

template<typename T>
vector<T> precalc_power(T base, int SZ){
vector<T> res(SZ + 1, 1);
for(auto i = 1; i <= SZ; ++ i) res[i] = res[i - 1] * base;
return res;
}

template<typename T>
vector<T> precalc_factorial(int SZ){
vector<T> res(SZ + 1, 1); res[0] = 1;
for(auto i = 1; i <= SZ; ++ i) res[i] = res[i - 1] * i;
return res;
}

using C = complex<double>;
using vd = vector<double>;
const long double PI = 2*acos(0.0);
void fft(vector<C>& a) {
int n = size(a), L = 31 - __builtin_clz(n);
static vector<complex<long double>> R(2, 1);
static vector<C> rt(2, 1);  // (^ 10% faster if double)
for (static int k = 2; k < n; k *= 2) {
R.resize(n); rt.resize(n);
const long double PI = 2*acos(0.0);
auto x = polar(1.0L, PI / k); // M_PI, lower-case L
for (int i = k; i < 2*k; ++i) rt[i] = R[i] = i&1 ? R[i/2] * x : R[i/2];
}
vector<int> rev(n);
for (int i = 0; i < n; ++i) rev[i] = (rev[i / 2] | (i & 1) << L) / 2;
for (int i = 0; i < n; ++i) if (i < rev[i]) swap(a[i], a[rev[i]]);
for (int k = 1; k < n; k *= 2)
for (int i = 0; i < n; i += 2 * k) for (int j = 0; j < k; ++j) {
C z = rt[j+k] * a[i+j+k]; // (25% faster if hand-rolled)  /// include-line
a[i + j + k] = a[i + j] - z;
a[i + j] += z;
}
}
vd conv(const vd& a, const vd& b) {
if (a.empty() || b.empty()) return {};
vd res(size(a) + size(b) - 1);
int L = 32 - __builtin_clz(size(res)), n = 1 << L;
vector<C> in(n), out(n);
copy(begin(a), end(a), begin(in));
for (int i = 0; i < (int)size(b); ++i) in[i].imag(b[i]);
fft(in);
for (auto &x : in) x *= x;
for (int i = 0; i < n; ++i) out[i] = in[-i & (n - 1)] - conj(in[i]);
fft(out);
for (int i = 0; i < (int)size(res); ++i) res[i] = imag(out[i]) / (4 * n);
return res;
}

// FFT-MOD
typedef vector<ll> vl;
vl convMod(const vl &a, const vl &b, int M) {
if (a.empty() || b.empty()) return {};
vl res(size(a) + size(b) - 1);
int B=32-__builtin_clz(size(res)), n=1<<B, cut=int(sqrt(M));
vector<C> L(n), R(n), outs(n), outl(n);
for (int i = 0; i < (int)size(a); ++i) L[i] = C((int)a[i] / cut, (int)a[i] % cut);
for (int i = 0; i < (int)size(b); ++i) R[i] = C((int)b[i] / cut, (int)b[i] % cut);
fft(L), fft(R);
for (int i = 0; i < n; ++i) {
int j = -i & (n - 1);
outl[j] = (L[i] + conj(L[j])) * R[i] / (2.0 * n);
outs[j] = (L[i] - conj(L[j])) * R[i] / (2.0 * n) / 1i;
}
fft(outl), fft(outs);
for (int i = 0; i < (int)size(res); ++i) {
ll av = ll(real(outl[i])+.5), cv = ll(imag(outs[i])+.5);
ll bv = ll(imag(outl[i])+.5) + ll(real(outs[i])+.5);
res[i] = ((av % M * cut + bv) % M * cut + cv) % M;
}
return res;
}

int main()
{
ios::sync_with_stdio(false); cin.tie(0);

int k; cin >> k >> mod;

auto fac = precalc_factorial<Zp>(2*k+10);
auto C = [&] (int n, int r) {
if (n < r or r < 0) return Zp(0);
return fac[n] / (fac[r] * fac[n-r]);
};

vector<ll> poly1(k/2 + 1), poly2(k/2 + 1), poly3(k/2 + 1);
for (int i = 0; i <= k/2; ++i) {
Zp cur = (Zp(-2) ^ i) / fac[i];
poly1[i] = cur.value;

cur = C(4*i, 2*i);
poly2[i] = cur.value;

cur = C(4*i+2, 2*i+1);
poly3[i] = cur.value;
}

auto res1 = convMod(poly1, poly2, mod);
auto res2 = convMod(poly1, poly3, mod);

cout << res2[0] << ' ';
for (int i = 1; i < (k+1)/2; ++i) {
cout << fac[2*i] * fac[2*i] * res1[i] << ' ';
cout << fac[2*i+1] * fac[2*i+1] * res2[i] << ' ';
}
if (k%2 == 0) cout << fac[k] * fac[k] * res1[k/2] << '\n';
}


This problem can be solved more simply using the theory of holonomic functions. As explained above, the number of n-beautiful permutations is b_n:=n!^2 \ a_n, where

a_n = \sum_{0\le i\le n/2} \frac{(-2)^i}{i!} {2n - 4i \choose n - 2i}.

But since (1-4z)^{-1/2}=\sum_{i\ge 0} {2i \choose i} z^i, a_n will equal the coefficient of z^n in

f(z):=e^{-2 z^2} (1-4 z)^{-1/2}.

Now

f'(z)=(-4z + 2 (1-4z)^{-1})f(z)

so

(1 - 4 z) f'(z) = (2 - 4 z + 16 z^2) f(z)

and, taking the coefficient of z^n and assuming n\ge 2,

(n+1) a_{n+1} - 4 n a_n = 2 a_n - 4 a_{n-1} + 16 a_{n-2}.

Therefore

a_{n+1} = (( 4 n+2) a_n - 4 a_{n-1} + 16 a_{n-2}) (n + 1)^{-1}

and if we multiply by (n+1)!^2, we can get a recurrence for b_n:

b_{n+1} = (n + 1) (( 4 n+2) b_n - 4 b_{n-1} n^2 + 16(n-1)^2n^2 b_{n-2}),

still assuming that n\ge 2. Together with the initial values b_0=1, b_1=2, and b_2=16, this gives a short, quick way of calculating b_1, \dots, b_K modulo M without needing to use Fourier transforms or number-theoretic transforms.

1 Like