EXPGROUP - Editorial

PROBLEM LINK:

Practice
Div-2 Contest
Div-1 Contest

Author: Jatin Yadav
Tester: Rahul Dugar
Editorialist: Mohammed Ehab

DIFFICULTY:

HARD

PREREQUISITES:

Divide and Conquer.
It helps to know about generating functions.
To be able to implement the solution, you need to know about polynomial operations like inverse series.

PROBLEM:

N students enter a room one by one, and they want to form groups. When a student enters, if there are g groups, he creates a new group containing only him with probability \frac{1}{g+1}, or he joins one of the existing groups uniformly at random, so each group has probability \frac{1}{g+1} the student will join. Find the expected value of the sum of squares of the groups’ sizes.

EXPLANATION:

Let’s start with a slightly simpler problem. What’s the expected value of the number of groups? Well, we need to keep track of which students started new groups. Suppose students with indices j_1, j_2, \ldots, j_k started k groups. What’s the probability this happens? Well, when student j_i entered, there were i-1 groups, so he had a probability of \frac{1}{i} to start a new group. And then, each student between j_i and j_{i+1} entered and found i groups and decided to join one of them. The probability he joins one of them is \frac{i}{i+1}. Let’s define x_i=j_{i+1}-j_i-1, the number of students who entered and found i groups and decided to join one of them (j_{k+1}=N+1.) The probability students j_1, j_2, \ldots, j_k started k groups is hence:

\prod\limits_{i=1}^{k} \frac{1}{i}(\frac{i}{i+1})^{x_i}

Now, to get the probability you’ll form k groups, we want to sum this expression up for all possible values x_1, x_2, \ldots, x_k. The only restrictions on these values are:

  • They’re non-negative
  • The must sum up to N-k. That’s because their sum is the number of students which don’t form a new group.

So we’re interested in:

\sum_{x_1+x_2+\dots+x_k=n-k} \prod\limits_{i=1}^{k} \frac{1}{i}(\frac{i}{i+1})^{x_i}

Now, if you haven’t seen such sums before, don’t be intimidated. There’s a way to deal with them. Let’s first try to simplify how it looks. Let f_i(x)=\frac{1}{i}(\frac{i}{i+1})^x. We want:

\sum_{x_1+x_2+\dots+x_k=n-k} f_1(x_1)f_2(x_2) \ldots f_k(x_k)

How do we deal with sums of products? Well, which mathematical objects have both summation and multiplication when we multiply 2 of them?.. it’s polynomials. When we multiply 2 polynomials Ax^i and Bx^j, the exponents add together and the coefficients multiply together, forming ABx^{i+j}. In that spirit, let’s define G_i as follows:

G_i(x)=\sum\limits_{r=0}^{\infty} f_i(r)x^r

This is basically a way to store the values of f_i called a generating function. We just store it as the coefficients of an infinite polynomial. The motivation is that multiplying G_i together is meaningful. Also, the motivation for summing up to infinity instead of N is to get a formula for this series, that we can actually use in algebraic manipulation! Now, what happens is you multiply G_1, G_2, \ldots, G_k together? Look at the coefficient of x^{N-k}. This is formed by picking exponents r_1, r_2, \ldots, r_k. Their sum is N-k, and they add what to the coefficient? They add f_1(r_1)f_2(r_2) \ldots f_k(r_k). This is the sum we’re interested in!

Now, let’s understand our generating function a bit better. What does it actually sum up to? Well, notice that the sum is actually a geometric progression with common ratio \frac{i}{i+1}x. So:

G_i(x)=\sum\limits_{r=0}^{\infty} \frac{1}{i}(\frac{i}{i+1})^rx^r=\frac{1}{i(1-\frac{i}{i+1}x)}

Let P_i(x)=i(1-\frac{i}{i+1}x), the denominator. Then, remember that the probability k groups will form is now just the coefficient of x^{N-k} in \prod\limits_{i=1}^{k} \frac{1}{P_i(x)}. Or, alternatively, it’s the coefficient of x^N in x^k\prod\limits_{i=1}^{k} \frac{1}{P_i(x)}. To get the expected value of the number of groups, you just need to multiply this probability by k, sum up across all k, and take the coefficient of x^N. The sum is:

\sum\limits_{k=1}^{N} kx^k\prod\limits_{i=1}^{k} \frac{1}{P_i(x)}

Now, having a polynomial that depends on k in the denominator is very annoying. So let’s try to move it to the nominator. Let A(x)=\prod\limits_{i=1}^{N} P_i(x). Then, if you divide \prod\limits_{i=k+1}^{N} P_i(x) by A(x), the common terms cancel, and the terms from 1 to k are left in the denominator. So you can instead calculate:

C(x)=\sum\limits_{k=1}^{N} kx^k\prod\limits_{i=k+1}^{N} P_i(x)

And then divide it by A(x) afterwards, to get the desired answer. I’ll describe an approach to calculate C(x) using divide and conquer. We’ll create a recursive function, solve(l,r), that returns 3 polynomials:

A(x)=\prod\limits_{i=l}^{r} P_i(x)
B(x)=\sum\limits_{k=1}^{r-l+1} x^k\prod\limits_{i=l+k}^{r} P_i(x)
C(x)=\sum\limits_{k=1}^{r-l+1} kx^k\prod\limits_{i=l+k}^{r} P_i(x)

We’re not really interested in B(x), but we need it for the recurrence. I’ll use the subscript left to denote the polynomials returned by solve(l,mid), right to denote the ones returned by solve(mid+1,r), and m=mid-l+1, then you can easily verify that:

A(x)=A_{left}(x)A_{right}(x)
B(x)=B_{left}(x)A_{right}(x)+x^mB_{right}(x)
C(x)=C_{left}(x)A_{right}(x)+x^m(mB_{right}(x)+C_{right}(x))

And finally, the coefficient of x^N in \frac{C(x)}{A(x)} (call it X_N) gives the expected number of groups. Phew!

But wait, we’re not done. The problem wanted the expected sum of squares of sizes. Actually, there’s a very elegant reduction to the one we solved:

When student i walks into the room, how does he change the expected sum of squares of group sizes? Suppose he walks in to find g groups of sizes s_1, s_2, \ldots, s_g. You may pretend there’s one more empty group of size s_{g+1}, and creating his own group is equivalent to joining this one. Now, he joins any group with probability \frac{1}{g+1}. If he joins a group of size s, the increase in the square of its size is (s+1)^2-s^2=2s+1. Taking the average across all groups, the expected increase is 1+\frac{2}{g+1}\sum\limits_{i=1}^{g+1} s_i. Now, a very nice cancellation happens. The sum of the sizes of the groups is just i-1! So the expected increase is 1+2(i-1)E[\frac{1}{g+1}]. So everything boils down to calculating this very weird quantity, E[\frac{1}{g+1}]. I mean, the expected value of the number of groups or the sum of squares is acceptable, but the expected value of one over (the number of groups+1)?? It turns out to have a very nice meaning. Remember that a student starts his own group with probability \frac{1}{g+1}, so the expected value of this quantity just means the expected increase in the number of groups, which is X_{i}-X_{i-1}!

SOLUTIONS:

Setter's Solution
#include <bits/stdc++.h>
using namespace std;
 
#define ll long long
#define vi vector<int>
 
const int g = 3, mod =  998244353, p =  998244353;
 
inline int add(int x, int y){ x += y; if(x >= mod) x -= mod; return x;}
inline int sub(int x, int y){ x -= y; if(x < 0) x += mod; return x;}
inline int mul(int x, int y){ return (x * 1ll * y) % mod;}
inline int powr(int a, ll b){
    int x = 1 % mod;
    while(b){
        if(b & 1) x = mul(x, a);
        a = mul(a, a);
        b >>= 1;
    }
    return x;
}
inline int inv(int a){ return powr(a, mod - 2);}
 
const int MX = 18;
int W[1 << MX], invW[1 << MX]; // max polynomial input/output -> (1 << MX)
int maxn;
 
void precompute_powers(){
    int p2 = p - 1, MAXN = 0;
    while(p2 % 2 == 0){
        p2 >>= 1;
        MAXN ++;
    }
    MAXN = min(MAXN, MX);
    int g1 = powr(g, (p - 1) >> MAXN);
    maxn = 1 << MAXN;
    int st = 1;
    for(int i = 0; i < maxn; i++){
        W[i] = st;
        st = mul(st, g1);
    }
    for(int i = 0; i < maxn; i++){
        invW[i] = (i == 0) ? 1 : W[maxn - i];
    }
}
 
void fft (vector<int> & a, bool invert) {
    int n = (int) a.size();
 
    for (int i=1, j=0; i<n; ++i) {
        int bit = n >> 1;
        for (; j>=bit; bit>>=1)
            j -= bit;
        j += bit;
        if (i < j)
            swap (a[i], a[j]);
    }
 
    for (int len=2; len<=n; len<<=1) {
        for (int i=0; i<n; i+=len) {
            int ind = 0,ADD = maxn/len;
            for (int j=0; j<len/2; ++j) {
                int u = a[i+j],  v = mul(a[i+j+len/2], (invert?invW[ind]:W[ind]));
                a[i+j] = add(u, v);
                a[i+j+len/2] = sub(u, v);
                ind += ADD;
            }
        }
    }
    if (invert){
        int invn = inv(n);
        for (int i=0; i<n; ++i) a[i] = mul(a[i], invn);
    }
}
 
vi shift(vi P, int k){
    P.resize(k + P.size());
    for(int i = (int)P.size() - 1; i >= 0; i--) P[i] = i >= k ? P[i - k] : 0;
    return P;
}
 
vi add(vi a, vi b){
    vi ret(max(a.size(), b.size()));
    for(int i = 0; i < ret.size(); i++){
        ret[i] = add(i < a.size() ? a[i] : 0, i < b.size() ? b[i] : 0);
    }
    return ret;
}
 
vi sub(vi a, vi b){ 
    vi ret(max(a.size(), b.size()));
    for(int i = 0; i < ret.size(); i++){
        ret[i] = sub(i < a.size() ? a[i] : 0, i < b.size() ? b[i] : 0);
    }
    return ret;
}
 
void mul(vi & a, vi b, int upper_limit = 1e9){
    int sz = a.size() + b.size() - 1;
    int k = 0;
    while((1 << k) < sz) k++;
    a.resize(1 << k); b.resize(1 << k);
    fft(a, 0); fft(b, 0);
    for(int i = 0; i < (1 << k); i++)
        a[i] = mul(a[i], b[i]);
    fft(a, 1);
    a.resize(min(upper_limit, sz));
}
 
vi mul_scalar(vi v, int k){
    for(auto & it : v) it = mul(k, it);
    return v;   
}
 
vi inverse(vi a, int sz){
    assert(a[0] != 0);
    vi x = {inv(a[0])};
    while(x.size() < sz){
        vi temp(a.begin(), a.begin() + min(a.size(), 2 * x.size()));
        vi nx = x;
        mul(nx, x);
        mul(nx, temp);
        x.resize(2 * x.size());
        for(int i = 0; i < x.size(); i++)
            x[i] = sub(add(x[i], x[i]), nx[i]);
    }
    x.resize(sz);
    return x;
}
 
const int N = 100005;
vi A[N * 4], B[N * 4], C[N * 4];
 
void go(int s, int e, int ind){
    if(s == e){
        int P = mul(s, inv(s + 1));
        A[ind] = {s, sub(0, mul(s, P))};
        B[ind] = {0, 1};
        C[ind] = {0, 1};
        return;
    }
    int mid = (s + e) >> 1;
    go(s, mid, 2 * ind);
    go(mid + 1, e, 2 * ind + 1);
 
    int k = mid - s + 1;
    mul(A[2 * ind], A[2 * ind + 1]);
    mul(B[2 * ind], A[2 * ind + 1]);
    mul(C[2 * ind], A[2 * ind + 1]);
    A[ind] = A[2 * ind];
    B[ind] = add(B[2 * ind], shift(B[2 * ind + 1], k));
    C[ind] = add(C[2 * ind], shift( add( mul_scalar(B[2 * ind + 1], k), C[2 * ind + 1] ), k));
}
 
// N log^2 N
int main(){
    precompute_powers();
    int n; cin >> n;
    go(1, n, 1);
    mul(C[1], inverse(A[1], n + 1));
    // E[1/(g+1)] = E[X_n] - E[x_n - 1]
    // E[sum of squares] = n + 2 * sum E[(i - 1)/(g_i + 1)]
    int ans = n;
    for(int i = 2; i <= n; i++){
        ans = add(ans, mul(2 * (i - 1), sub(C[1][i], C[1][i - 1])));
    }
    cout << ans << endl;
}

VIDEO EDITORIAL: