RANDCHCL - Editorial

tmwilliamlin · March 22, 2020, 6:35pm

PROBLEM LINK:

Practice
Div-1 Contest

Author: Anik Sarker
Tester: Raja Vardhan Reddy
Editorialist: William Lin

DIFFICULTY:

Medium

PREREQUISITES:

Combinatorics

PROBLEM:

Given an array W of length N, find the expected score from the following process:

Choose an integer k between 1 and N uniformly randomly.
Let A be k not necessarily distinct elements chosen uniformly randomly and independently from W.
Let B be chosen in the same way as A.
The score is \sum_{i=1}^k \sum_{j =1}^k \gcd(A_i, B_j).

QUICK EXPLANATION:

The score is just k^2 times the expected value of \gcd(W_i, W_j), so the answer is just the expected value of k^2 times the expected value of \gcd(W_i, W_j). To find the expected value of \gcd(W_i, W_j), we will iterate over all g and count the number of pairs such that \gcd(W_i, W_j)=g.

EXPLANATION:

Note that the score is just k^2 times the expected value of \gcd(W_i, W_j), where i and j are chosen uniformly randomly.

Proof

All A_i are chosen in the same way, and the same is true for all B_j. Thus, there is no reason for the expected value of \gcd(A_i, B_j) to be different for the k^2 pairs (i, j).

Since k is independent from \gcd(W_i, W_j), the answer is the expected value of k^2 times the expected value of \gcd(W_i, W_j).

We can easily calculate the expected value of k^2.

Explanation

Add up k^2 for k from 1 to N and divide by N. Alternatively, use the well-known formula for the sum of the first k squares, which is \frac{k(k+1)(2k+1)}{6}.

What remains is to calculate the expected value of \gcd(W_i, W_j), which is the same as the sum of all \gcd(W_i, W_j) divided by N^2.

Finding the sum of all \gcd(W_i, W_j) is a well-known problem.

Explanation

First, we will calculate a frequency array c of all the elements in W. Then, we will calculate an array d such that d_i is the number of elements in W which are multiples of i. The pseudocode is shown below:

for i from 1 to MAX:
	j = i
	while j <= MAX:
		d[i] += c[j]
		j += i

The total time can be approximated by \sum_{i = 1}^{N} \frac{N}{i}, which is well-known to be equal to O(N \log N) (approximate the sum with an integral).

Next, set d_i to d_i^2, so d_i now represents the number of pairs of elements in W such that both elements of the pair are multiples of i.

Note that d_i also represents the number of pairs such that the gcd of the first element and the second element is a multiple of i. Next, we want to find e_i, which is the number of pairs such that the gcd of the first element and the second element is exactly i. We can find e_i with complementary counting. We subtract all cases when the gcd isn’t exactly i but is a multiple of i:

for i from MAX to 1:
	e[i] = d[i]*d[i]
	j = 2*i
	while j <= MAX:
		e[i] -= e[j]
		j += i

The final sum of the gcd over all pairs can be found by summing up i\cdot e_i.

The total time complexity is O(N \log N), where N also represents the maximum value in W.

SOLUTIONS:

Setter's Solution

#include <bits/stdc++.h>
using namespace std;
 
const int maxn = 300005;
const int maxv = 500005;
const int mod = 998244353;
 
int add(int a, int b) {return (a + b) >= mod ? a + b - mod : a + b;}
int sub(int a, int b) {return add(a, mod - b);}
int mul(int a, int b) {return a * 1LL * b % mod;}
int bigMod(int n, int r){
    int res = 1;
    int cur = n;
    while(r){
        if(r & 1) res = mul(res, cur);
        cur = mul(cur, cur);
        r >>= 1;
    }
    return res;
}
int Div(int a, int b) {return mul(a, bigMod(b, mod-2));}
 
namespace NumberTheory{
    int mob[maxv];
    int phi[maxv];
 
    void sieve(){
        mob[1] = 1;
        for(int i=1; i<maxv; i++){
            for(int j=i; j<maxv; j+=i){
                if(j > i) mob[j] -= mob[i];
            }
        }
 
        for(int i=1; i<maxv; i++){
            for(int j=i; j<maxv; j+=i){
                phi[j] += i * mob[j / i];
            }
        }
    }
}
using namespace NumberTheory;
 
int a[maxn];
int occ[maxv];
int cnt[maxv];
 
int main(){
    sieve();
 
    int t;
    scanf("%d", &t);
 
    for(int cs=1; cs<=t; cs++){
        int n, v;
        scanf("%d", &n);
 
        for(int i=1; i<=n; i++) {scanf("%d", &v); occ[v]++;}
        for(int v=1; v<maxv; v++) for(int w=v; w<maxv; w+=v) cnt[v] += occ[w];
 
        int P = 0;
        for(int i=1; i<maxv; i++){
            int curr = cnt[i];
            curr = mul(curr, curr);
            P = add(P, mul(curr, phi[i]));
        }
        P = Div(P, mul(n, n));
 
        int Q = 0;
        for(int i=1; i<=n; i++) Q = add(Q, mul(i, i));
        Q = Div(Q, n);
 
        printf("%d\n", mul(P, Q));
        for(int v=1; v<maxv; v++) occ[v] = 0, cnt[v] = 0;
    }
}

Tester's Solution

//raja1999
 
//#pragma comment(linker, "/stack:200000000")
//#pragma GCC optimize("Ofast")
//#pragma GCC target("sse,sse2,sse3,ssse3,sse4,avx,avx2")
 
#include <bits/stdc++.h>
#include <vector>
#include <set>
#include <map>
#include <string> 
#include <cstdio>
#include <cstdlib>
#include <climits>
#include <utility>
#include <algorithm>
#include <cmath>
#include <queue>
#include <stack>
#include <iomanip> 
#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/tree_policy.hpp> 
//setbase - cout << setbase (16)a; cout << 100 << endl; Prints 64
//setfill -   cout << setfill ('x') << setw (5); cout << 77 <<endl;prints xxx77
//setprecision - cout << setprecision (14) << f << endl; Prints x.xxxx
//cout.precision(x)  cout<<fixed<<val;  // prints x digits after decimal in val
 
using namespace std;
using namespace __gnu_pbds;
#define f(i,a,b) for(i=a;i<b;i++)
#define rep(i,n) f(i,0,n)
#define fd(i,a,b) for(i=a;i>=b;i--)
#define pb push_back
#define mp make_pair
#define vi vector< int >
#define vl vector< ll >
#define ss second
#define ff first
#define ll long long
#define pii pair< int,int >
#define pll pair< ll,ll >
#define sz(a) a.size()
#define inf (1000*1000*1000+5)
#define all(a) a.begin(),a.end()
#define tri pair<int,pii>
#define vii vector<pii>
#define vll vector<pll>
#define viii vector<tri>
#define mod (998244353LL)
#define pqueue priority_queue< int >
#define pdqueue priority_queue< int,vi ,greater< int > >
#define int ll
 
typedef tree<
int,
null_type,
less<int>,
rb_tree_tag,
tree_order_statistics_node_update>
ordered_set;
 
 
//std::ios::sync_with_stdio(false);
 
int power(int a,int b){
	int res=1;
	while(b>0){
		if(b%2){
			res*=a;
			res%=mod;
		}
		a*=a;
		a%=mod;
		b/=2;
	}
	return res;
}
int cnt[1123456],w[1123456],gg[1123456],haha[1123456];
main(){
	std::ios::sync_with_stdio(false); cin.tie(NULL);
	int t,i,j;
	cin>>t;
	while(t--){
		int n;
		cin>>n;
		rep(i,n){
			cin>>w[i];
		}
		int s=0,ans=0;
		for(i=0;i<n;i++){
			haha[w[i]]++;
			s+=w[i];
			s%=mod;
		}
		for(i=1;i<=500005;i++){
			cnt[i]=0;
			for(j=i;j<=500005;j+=i){
				cnt[i]+=haha[j];
			}
		}
		fd(i,500005,1){
			gg[i]=cnt[i]*(cnt[i]-1);
			for(j=i*2;j<=500005;j+=i){
				gg[i]-=gg[j];
			}
			s+=(gg[i]*i)%mod;
			s%=mod;
		}
		f(i,1,n+1){
			ans+=(i*i)%mod;
			ans%=mod;
		}
		ans*=s;
		ans%=mod;
		ans*=power((n*n*n)%mod,mod-2);
		ans%=mod;
		cout<<ans<<endl;
		rep(i,n){
			haha[w[i]]--;
		}
	}
	return 0;
}

Editorialist's Solution

#include <bits/stdc++.h>
using namespace std;

#define ll long long

const int mxN=3e5, mxW=5e5, M=998244353;
int n;
ll iv[mxN+1], c[mxW+1], d[mxW+1];

void solve() {
	//input
	cin >> n;
	memset(c, 0, 8*(mxW+1));
	for(int i=0, w; i<n; ++i)
		cin >> w, ++c[w];

	//find # multiples for each i
	for(int i=1; i<=mxW; ++i)
		for(int j=i; (j+=i)<=mxW; )
			c[i]+=c[j];
	//find sum of gcd of pairs
	ll ans=0;
	for(int i=mxW; i; --i) {
		d[i]=c[i]*c[i];
		//subtract those with bigger gcd
		for(int j=i; (j+=i)<=mxW; )
			d[i]-=d[j];
		//add to ans
		ans=(ans+d[i]*i)%M;
	}

	//multiply by n*(n+1)*(2*n+1)/6/n^3
	cout << ans*(n+1)%M*(2*n+1)%M*iv[6]%M*iv[n]%M*iv[n]%M << "\n";
}

int main() {
	ios::sync_with_stdio(0);
	cin.tie(0);

	//modular inverses
	iv[1]=1;
	for(int i=2; i<=mxN; ++i)
		iv[i]=M-M/i*iv[M%i]%M;

	int t;
	cin >> t;
	while(t--)
		solve();
}

Please give me suggestions if anything is unclear so that I can improve. Thanks

induber · March 22, 2020, 10:55pm

well, i have a different solution for this,lets calculate how many times an element will occur in all the possible distribution of the elements in A, let it be x. same will happen for B. So each element will occur in B x times, so each GCD pair will occur x*x times.

So our final answer will be {(1/N) * ( Summation (i = 1 to N) [ ((1/N)^(2 * i)) * `(x*x) * (all pair gcd sum)])}.
where x = i * (power(n, i) - (n-1)*power(n, i-1)) .

Solution Link

tarun_6699 · March 23, 2020, 11:53am

I am unable to understand the expression given in the question please explain it.
∑∑ gcd(A i,B j ).

sunil5798 · March 23, 2020, 12:08pm

if anybody knows link to editorial for finding summation of gcd(Wi,Wj), please share.

tmwilliamlin · March 23, 2020, 12:18pm

Just in case you don’t know, if you press explanation it shows the explanation for finding the summation of gcd(Wi, Wj)

ay2306 · March 23, 2020, 2:03pm

This question seemed like some very advanced mathematical problem but editorial made it seem like simple math. Great editorial.

divyansh15 · April 2, 2020, 11:14am

First, we will calculate a frequency array cc of all the elements in WW. Then, we will calculate an array dd such that d_id
i

is the number of elements in WW which are multiples of ii. The pseudocode is shown below:

for i from 1 to MAX:
j = i
while j <= MAX:
d[i] += c[j]
j += i
The total time can be approximated by \sum_{i = 1}^{N} \frac{N}{i}∑
i=1
N

i
N

, which is well-known to be equal to O(N \log N)O(NlogN) (approximate the sum with an integral).

Next, set d_id
i

to d_i^2d
i
2

, so d_id
i

now represents the number of pairs of elements in WW such that both elements of the pair are multiples of ii.

Note that d_id
i

also represents the number of pairs such that the gcd of the first element and the second element is a multiple of ii. Next, we want to find e_ie
i

, which is the number of pairs such that the gcd of the first element and the second element is exactly ii. We can find e_ie
i

with complementary counting. We subtract all cases when the gcd isn’t exactly ii but is a multiple of ii:

for i from MAX to 1:
e[i] = d[i]d[i]
j = 2i
while j <= MAX:
e[i] -= e[j]
j += i
The final sum of the gcd over all pairs can be found by summing up I * e[I]

divyansh15 · April 2, 2020, 11:15am

Written in editorial itself.

nemesis1 · April 11, 2020, 9:21am

@tmwilliamlin @induber @ay2306
why score is expected value of gcd(a[i],b[j])*k^2?
how does expected value term come from score expression ?

ay2306 · April 11, 2020, 10:13am

There are 2K chocolates chosen from N chocolates and written in two array and all could be choosen any number of time with equal probability then those. Thus for any value GCD(A_i,B_j) seriously have equal probability to be selected from sequence

Like if chocolates were C_1,C_2,...,C_N then A_i can be anything from C with equal probability same for B_j.

Thus for any fixed k,
Expected value of \sum_{i=1}^{k}\sum_{j=1}^{k}GCD(A_i,B_j) is \sum_{i=1}^{k}\sum_{j=1}^{k}E(GCD(A_i,B_j) )

E(GCD(A_i,B_j)) is equal as for all i and j as all (A_i,B_j) has equal probability to be any pair (x,y) as all are choosen randomly and independently

Therefore \sum_{i=1}^{k}\sum_{j=1}^{k}E(GCD(A_i,B_j) ) = k^2*E(GCD(A_i,B_j) )

But is k is also not fixed, then ans = E(k^2)*E(GCD(A_i,B_j))

I am not that good at math, so do tell me if I am wrong.

nemesis1 · April 11, 2020, 1:56pm

Thank you very much
I really appreciate writing back

flyingkebab · April 18, 2020, 1:29pm

Man, that wasn’t trivial at all, the UI should seriously improve xD