# COMBI_IS_FUN - Editorial

Authors: lawliet_p and satyam_343
Tester: tabr
Editorialist: iceknight1093

TBD

# PREREQUISITES:

Combinatorics — specifically stars and bars, the inclusion-exclusion principle

# PROBLEM:

For a fixed parameter D and a multiset S, the cost of a partition of S into several non-empty multisets S_1, S_2, \ldots, S_k equals

\sum_{i=1}^N (D + \max(S_i) - \min(S_i))

Define F(S, D) to be the minimum cost of a partition of S with parameter D.

You’re given N, M, D, and K.
Find the number of distinct multisets A of length N such that:

• 1 \leq A_i \leq M for each i; and
• F(A, D) = K

# EXPLANATION:

First, let’s see how to compute F(A, D) for a fixed multiset A.

Computing F(A, D)

Let A_1 \leq A_2 \leq \ldots \leq A_N be the multiset.
We’ll call a partition optimal if it attains F(A, D).

Claim: There exists an optimal partition such that every subset will consist of a contiguous segment of the A_i.
Proof: Consider some partition of A into subsets. Suppose there are indices i \lt j \lt k such that i and k belong to the same subset (say S_1), but j doesn’t (say it’s in S_2).
Then, moving j from S_2 to S_1:

• Doesn’t change the contribution of S_1 to the cost at all, since j is in the middle.
• Doesn’t increase the contribution of S_2 to the cost; since either it was in the middle and didn’t matter, or was an endpoint and its removal brought the endpoints closer, hence lowering cost.

So, moving j to S_1 is not worse.
By repeatedly performing this process on the j with minimal index, we see that in at most N moves we reach a state where each subset is a segment, thus proving the claim.

Now, consider some partition of A into segments; say there are k segments [L_i, R_i].
The cost of this partition is

\sum_{i=1}^k (D + A_{R_i} - A_{L_i}) = k\cdot D + \sum_{i=1}^k (A_{R_i} - A_{L_i}) \\ = k\cdot D + \sum_{i=1}^k\left(\sum_{j=L_i}^{R_i - 1} (A_{j+1} - A_j) \right)

This means we’re essentially summing up differences between all pairs of adjacent elements that lie in the same segment; and adding D for each adjacent pair that isn’t in the same segment (along with one extra D term).
Thinking of this differently, for each adjacent pair of elements (A_i, A_{i+1}), we can:

• Take them into the same segment, for a cost of A_{i+1} - A_i; or
• Keep them in different segments, for a cost of D.

Together, this tells us that the minimum possible cost is simply

F(A, D) = D + \sum_{i=2}^N \min(A_i - A_{i-1}, D)

To apply the above condition, we also need a nice enough model of a multiset.

Consider a multiset A of length N with elements between 1 and M, such that A_i \leq A_{i+1}.
Note that this multiset is determined uniquely by:

• The value of A_1;
• The sequence of adjacent differences (A_{i+1} - A_i); and
• The value A_N

This is a useful characterization because as we saw earlier, we’re interested in adjacent differences.
Further, if we define A_0 := 0 and A_{N+1} := M, then A_1 and A_N are also defined by adjacent differences (to 0 and M respectively).

That is, if we let B_i = A_i - A_{i-1}, then A is determined uniquely by the N+1 values [B_1, B_2, \ldots, B_{N+1}].
Note that there are a couple of constraints on the B_i values:

• B_1 \geq 1, because we want A_1 \geq 1.
• B_1 + B_2 + \ldots + B_{N+1} = M, since we start at 0 and end at M.
Considering (B_1 - 1) instead of B_1, we replace the above two constraints by B_1 \geq 0 and sum(B_i) = M-1 instead.

Let’s now see how we can count valid multisets.
B_1 and B_{N+1} don’t contribute to the cost at all.
For each 1 \lt i \leq N, we add \min(D, B_i) to the cost.
There’s always an extra D added to the cost; so let’s just work with K' = K-D as the target cost instead.

We’ll treat elements that are \lt D and \geq D differently.
Suppose x of the B_i are \geq D.
We fix their positions in C(N-1, x) ways.
Note that:

• Each of these x indices contribute D to the cost, and must contain values that are \geq D.
• Each of the other N-1-x indices contribute B_i to the cost, and must contain values that are \lt D.
• Further, the sum of B_i in the second case must equal exactly K' - x\cdot D for the total cost to equal K'.
• B_1 and B_{N+1} are mostly unconstrained; they just need to be \geq 0, and as noted above the sum of all B_i should be M-1.

Without loss of generality, suppose B_2, B_3, \ldots, B_{x+1} are to be \geq D.
Then, for each of them, we can write B_i = D + C_i, where C_i \geq 0.
The C_i are otherwise unconstrained.

Putting everything together, we have:

B_1 + B_2 + \ldots + B_{N+1} = M-1 \\ B_1 + B_{N+1} + (B_2 + B_3 + \ldots + B_{x+1}) + (B_{x+2} + \ldots + B_N) = M-1 \\ B_1 + B_{N+1} +x\cdot D + (C_2 + C_3 + \ldots + C_{x+1}) + (K' - x\cdot D) = M-1 \\ B_1 + B_{N+1} + C_2 + C_3 + \ldots + C_{x+1} = M-1-K' \\

Here, each of the C_i, and both B_1 and B_{N+1} only need to be non-negative integers.
The number of solutions to this equation can thus be found by stars and bars.

That only leaves the values [B_{x+2}, B_{x+3}, \ldots, B_{N}].
Each of these must be \lt D, and their overall sum must equal K' - x\cdot D.
Counting the number of solutions to this can be done by combining the stars-and-bars method with inclusion-exclusion.

How?

If there were no upper bound D, this is a direct application of stars-and-bars.

Now, let’s define P_i: the number of arrangements such that exactly i of the indices violate the upper bound, i.e, are \geq D.
We want to compute P_0.
As is usual for inclusion-exclusion tasks, when exactness is hard to deal with, we relax the constraints a bit.
Define Q_i to be the number of arrangements such that at least i of the indices violate the upper bound.

Computing Q_i is not hard: fix which of the i positions are being violated, and then everything is free with the target sum being K' - x\cdot D - i\cdot D (using a similar trick as we did earlier when converting B_i to C_i).

Inclusion-exclusion then tells us that

P_0 = \sum_{i=0}^{N-1-x} (-1)^i \binom{N-1-x}{i} Q_i

which can be computed in \mathcal{O}(N) time once all the Q_i are known.

The problem is thus solved in \mathcal{O}(N^2) time.

# TIME COMPLEXITY

\mathcal{O}(N^2) per testcase.

# CODE:

Author's code (C++)
#pragma GCC optimize("O3,unroll-loops")
#include <bits/stdc++.h>
#include <ext/pb_ds/tree_policy.hpp>
#include <ext/pb_ds/assoc_container.hpp>
using namespace __gnu_pbds;
using namespace std;
#define ll long long
#define pb push_back
#define mp make_pair
#define nline "\n"
#define f first
#define s second
#define pll pair<ll,ll>
#define all(x) x.begin(),x.end()
#define vl vector<ll>
#define vvl vector<vector<ll>>
#define vvvl vector<vector<vector<ll>>>
#ifndef ONLINE_JUDGE
#define debug(x) cerr<<#x<<" "; _print(x); cerr<<nline;
#else
#define debug(x);
#endif
void _print(ll x){cerr<<x;}
void _print(char x){cerr<<x;}
void _print(string x){cerr<<x;}
template<class T,class V> void _print(pair<T,V> p) {cerr<<"{"; _print(p.first);cerr<<","; _print(p.second);cerr<<"}";}
template<class T>void _print(vector<T> v) {cerr<<" [ "; for (T i:v){_print(i);cerr<<" ";}cerr<<"]";}
template<class T>void _print(set<T> v) {cerr<<" [ "; for (T i:v){_print(i); cerr<<" ";}cerr<<"]";}
template<class T>void _print(multiset<T> v) {cerr<< " [ "; for (T i:v){_print(i);cerr<<" ";}cerr<<"]";}
template<class T,class V>void _print(map<T, V> v) {cerr<<" [ "; for(auto i:v) {_print(i);cerr<<" ";} cerr<<"]";}
typedef tree<ll, null_type, less<ll>, rb_tree_tag, tree_order_statistics_node_update> ordered_set;
typedef tree<ll, null_type, less_equal<ll>, rb_tree_tag, tree_order_statistics_node_update> ordered_multiset;
typedef tree<pair<ll,ll>, null_type, less<pair<ll,ll>>, rb_tree_tag, tree_order_statistics_node_update> ordered_pset;
//--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
const ll MOD=998244353;
const ll MAX=5000500;
vector<ll> fact(MAX+2,1),inv_fact(MAX+2,1);
ll binpow(ll a,ll b,ll MOD){
ll ans=1;
a%=MOD;
while(b){
if(b&1)
ans=(ans*a)%MOD;
b/=2;
a=(a*a)%MOD;
}
return ans;
}
ll inverse(ll a,ll MOD){
return binpow(a,MOD-2,MOD);
}
void precompute(ll MOD){
for(ll i=2;i<MAX;i++){
fact[i]=(fact[i-1]*i)%MOD;
}
inv_fact[MAX-1]=inverse(fact[MAX-1],MOD);
for(ll i=MAX-2;i>=0;i--){
inv_fact[i]=(inv_fact[i+1]*(i+1))%MOD;
}
}
ll nCr(ll a,ll b,ll MOD){
if(a==b){
return 1;
}
if((a<0)||(a<b)||(b<0))
return 0;
ll denom=(inv_fact[b]*inv_fact[a-b])%MOD;
return (denom*fact[a])%MOD;
}
ll getv(ll n,ll max_val,ll need){
ll ans=0,sgn=1;
for(ll i=0;i<=n;i++){
ans+=sgn*nCr(n,i,MOD)*nCr(need-i*(max_val+1)+n-1,n-1,MOD);
ans%=MOD;
sgn*=-1;
}
ans=(ans+MOD)%MOD;
return ans;
}
ll sum=0;
void solve(){
ll n,m,d,k; cin>>n>>m>>d>>k;
sum+=n;
ll ans=0;
for(ll i=0;i<=n-1;i++){
ll cur_cost=i*d;
if(cur_cost>k){
break;
}
ll rem_cost=k-cur_cost-d;
ll now=getv(n-1-i,d-1,rem_cost);
now=(now*nCr(n-1,i,MOD))%MOD;
ll left_sum=m-k+d;
now=(now*nCr(left_sum+i,i+1,MOD))%MOD;
ans=(ans+now)%MOD;
}
cout<<ans<<nline;
return;
}
int main()
{
ios_base::sync_with_stdio(false);
cin.tie(NULL);
#ifndef ONLINE_JUDGE
freopen("input.txt", "r", stdin);
freopen("output.txt", "w", stdout);
freopen("error.txt", "w", stderr);
#endif
ll test_cases=1;
cin>>test_cases;
precompute(MOD);
while(test_cases--){
solve();
}
debug(sum);
cout<<fixed<<setprecision(10);
cerr<<"Time:"<<1000*((double)clock())/(double)CLOCKS_PER_SEC<<"ms\n";
}

Tester's code (C++)
// ignore \r
#include <bits/stdc++.h>
using namespace std;

struct input_checker {
string buffer;
int pos;

const string all = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
const string number = "0123456789";
const string upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
const string lower = "abcdefghijklmnopqrstuvwxyz";

input_checker() {
pos = 0;
while (true) {
int c = cin.get();
if (c == -1) {
break;
} else if (c == '\r') {
continue;
}
buffer.push_back((char) c);
}
}

assert(pos < (int) buffer.size());
string res;
while (pos < (int) buffer.size() && buffer[pos] != ' ' && buffer[pos] != '\n') {
res += buffer[pos];
pos++;
}
return res;
}

string readString(int min_len, int max_len, const string& pattern = "") {
assert(min_len <= max_len);
assert(min_len <= (int) res.size());
assert((int) res.size() <= max_len);
for (int i = 0; i < (int) res.size(); i++) {
assert(pattern.empty() || pattern.find(res[i]) != string::npos);
}
return res;
}

int readInt(int min_val, int max_val) {
assert(min_val <= max_val);
assert(min_val <= res);
assert(res <= max_val);
return res;
}

long long readLong(long long min_val, long long max_val) {
assert(min_val <= max_val);
assert(min_val <= res);
assert(res <= max_val);
return res;
}

vector<int> readInts(int size, int min_val, int max_val) {
assert(min_val <= max_val);
vector<int> res(size);
for (int i = 0; i < size; i++) {
if (i != size - 1) {
}
}
return res;
}

vector<long long> readLongs(int size, long long min_val, long long max_val) {
assert(min_val <= max_val);
vector<long long> res(size);
for (int i = 0; i < size; i++) {
if (i != size - 1) {
}
}
return res;
}

assert((int) buffer.size() > pos);
assert(buffer[pos] == ' ');
pos++;
}

assert((int) buffer.size() > pos);
assert(buffer[pos] == '\n');
pos++;
}

assert((int) buffer.size() == pos);
}
};

template <long long mod>
struct modular {
long long value;
modular(long long x = 0) {
value = x % mod;
if (value < 0) value += mod;
}
modular& operator+=(const modular& other) {
if ((value += other.value) >= mod) value -= mod;
return *this;
}
modular& operator-=(const modular& other) {
if ((value -= other.value) < 0) value += mod;
return *this;
}
modular& operator*=(const modular& other) {
value = value * other.value % mod;
return *this;
}
modular& operator/=(const modular& other) {
long long a = 0, b = 1, c = other.value, m = mod;
while (c != 0) {
long long t = m / c;
m -= t * c;
swap(c, m);
a -= t * b;
swap(a, b);
}
a %= mod;
if (a < 0) a += mod;
value = value * a % mod;
return *this;
}
friend modular operator+(const modular& lhs, const modular& rhs) { return modular(lhs) += rhs; }
friend modular operator-(const modular& lhs, const modular& rhs) { return modular(lhs) -= rhs; }
friend modular operator*(const modular& lhs, const modular& rhs) { return modular(lhs) *= rhs; }
friend modular operator/(const modular& lhs, const modular& rhs) { return modular(lhs) /= rhs; }
modular& operator++() { return *this += 1; }
modular& operator--() { return *this -= 1; }
modular operator++(int) {
modular res(*this);
*this += 1;
return res;
}
modular operator--(int) {
modular res(*this);
*this -= 1;
return res;
}
modular operator-() const { return modular(-value); }
bool operator==(const modular& rhs) const { return value == rhs.value; }
bool operator!=(const modular& rhs) const { return value != rhs.value; }
bool operator<(const modular& rhs) const { return value < rhs.value; }
};
template <long long mod>
string to_string(const modular<mod>& x) {
}
template <long long mod>
ostream& operator<<(ostream& stream, const modular<mod>& x) {
return stream << x.value;
}
template <long long mod>
istream& operator>>(istream& stream, modular<mod>& x) {
stream >> x.value;
x.value %= mod;
if (x.value < 0) x.value += mod;
return stream;
}

constexpr long long mod = 998244353;
using mint = modular<mod>;

mint power(mint a, long long n) {
mint res = 1;
while (n > 0) {
if (n & 1) {
res *= a;
}
a *= a;
n >>= 1;
}
return res;
}

vector<mint> fact(1, 1);
vector<mint> finv(1, 1);

mint C(int n, int k) {
if (n < k || k < 0) {
return mint(0);
}
while ((int) fact.size() < n + 1) {
fact.emplace_back(fact.back() * (int) fact.size());
finv.emplace_back(mint(1) / fact.back());
}
return fact[n] * finv[k] * finv[n - k];
}

mint f(int n, int k) {
if (n == 0 && k == 0) {
return 1;
} else {
return C(n + k - 1, k);
}
}

int main() {
input_checker in;
int sn = 0, sm = 0;
while (tt--) {
sn += n;
sm += m;
mint ans = 0;
k -= d;
for (int cnt = 0; cnt <= n - 1; cnt++) {
if (k - d * cnt < 0 || m - (k - d * cnt) < 0) {
continue;
}
int x = k - d * cnt;
int y = m - k - 1;
mint t = 0;
// put x balls into (n - cnt - 1) boxes, less than d balls for each box
for (int i = 0; i <= n - cnt - 1; i++) {
if (d * i > x) {
break;
}
int z = x - d * i;
mint s = (i % 2 == 1 ? -1 : +1) * C(n - cnt - 1, i) * f(n - cnt - 1, z);
t += s;
// cerr << i << " " << s << endl;
}
// put y balls into (cnt + 2) boxes
t *= C(y + cnt + 1, cnt + 1);
// choose cnt boxes from n - 1
t *= C(n - 1, cnt);
ans += t;
// cerr << cnt << ": " << t << endl;
}
cout << ans << '\n';
}
assert(sn <= 5000);
assert(sm <= 5e6);
return 0;
}

Editorialist's code (Python)
mod = 998244353
lim = 2 * 10**6 + 20
fac =  + [i for i in range(1, lim)]
for i in range(1, lim):
fac[i] = fac[i-1] * i % mod
invf = *lim
invf[-1] = pow(fac[-1], mod-2, mod)
for i in reversed(range(lim-1)):
invf[i] = invf[i+1] * (i+1) % mod

def C(n, r):
if n < r or r < 0: return 0
return fac[n] * invf[r] % mod * invf[n-r] % mod

def calc(k, n): # sum k non-negative integers to get n
if k == 0: return 1 if n == 0 else 0
return C(n+k-1, n)

for _ in range(int(input())):
n, m, d, k = map(int, input().split())
ans = 0
k -= d

for i in range(n):
# Fix the number of positions that are >= d
if i*d > k: break
ways = C(n-1, i) * calc(i+2, m-1-k) % mod
sm, sign = 0, 1
for j in range(n-i):
if k - i*d - j*d < 0: break
sm += C(n-1-i, j) * calc(n-1-i, k - i*d - j*d) % mod * sign % mod
sign *= -1
ans += ways * sm % mod
print(ans % mod)


Extremely satisfying explanation!! Thanks a lot!