PROBLEM LINK:
Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4
Author: astthecoder
Tester: satyam_343
Editorialist: iceknight1093
DIFFICULTY:
2889
PREREQUISITES:
Dynamic programming, expected value, the inclusion-exclusion principle
PROBLEM:
You’re given an integer N.
For an integer 1 \leq x \leq N, consider the following process:
- Pick a random integer y between 1 and N
- Replace x by \gcd(x, y).
Let \mathbb{E}[x] be the expected number of moves this process takes till x reaches 1.
For each 1 \leq x \leq N, compute \mathbb{E}[x].
EXPLANATION:
Writing out the expected value for a single x, we see that:
This lends itself to a rather natural (if slow) recursive approach, because \gcd(x, y) \leq x.
The only issue is those y for which \gcd(x, y) = x, since they lead to ‘loops’.
However, such y lead to an \mathbb{E}[x] term anyway, so they can be taken care of with a bit of algebra.
Suppose there are k such y. Then,
The right-hand side here now depends only on values strictly less than x, so a dynamic programing approach can compute it.
Once the RHS is known, \mathbb{E}[x] is easily found.
However, it’s still \mathcal{O}(N^2), so we need to speed it up.
Let’s look at the expression we want to compute.
The expression we want to compute is \sum_{\gcd(x, y) \neq x} \mathbb{E}[\gcd(x, y)].
Let’s fix g = \gcd(x, y) and try to find the number of 1 \leq y \leq N such that \gcd(x, y) = g.
If we know this, we can add this count multiplied by \mathbb{E}[\gcd(x, y)] to the overall sum.
g has to be a divisor of x, which greatly limits the number of g we need to consider.
Writing x = g\cdot s and y = g\cdot t, we have \gcd(g\cdot s, g\cdot t) = g, which means \gcd(s, t) = 1.
Further, g\cdot t \leq N, meaning t \leq \frac{N}{g}
So, we really just want to compute the number of integers in the range [1, \frac{N}{g}] that are coprime to s = \frac{x}{g}.
That can be done using inclusion-exclusion on the prime factors of s.
Details
Let M = \frac{N}{g} be our upper bound.
Let the distinct prime factors of s be p_1, p_2, \ldots, p_k.
Then, we have the following:
- Initially, pretend everything in [1, M] is coprime to s.
- Anything not coprime to s must have some p_i as a factor.
So, subtract all multiples of p_1, p_2, \ldots, p_k from the count.
These are \left \lfloor \frac{M}{p_1} \right\rfloor, \left \lfloor \frac{M}{p_2} \right\rfloor, \ldots, \left \lfloor \frac{M}{p_k} \right\rfloor respectively. - However, now multiples of, for example, p_1p_2 were counted twice.
So, for each pairwise product of primes, add back in the count of its multiples. - Then, you’ll need to subtract the count of multiples of all triplet products, and so on.
In general, we go through each of the 2^k possibilities of products of prime factors, and either add or subtract the count of its multiples depending on how many primes are involved in the product.
A slightly more detailed explanation can be found here.
Notice that if s has k distinct prime factors, this approach takes 2^k or 2^k \cdot k steps, depending on implementation.
Since N \leq 5\cdot 10^5, any integer \leq N has at most 6 distinct prime factors, so k \leq 6 here, which is pretty small.
We do the 2^k operation for each pair of (number, divisor) below N.
It’s well-known that there are \mathcal{O}(N\log N) such pairs, so a simple upper bound on complexity is \mathcal{O}(2^6 \cdot N\log N).
However, in practice most integers have far fewer than 6 prime factors, so the constant factor is a lot better and your code should run fast.
In particular, even with a 2^k \cdot k algorithm that goes through bitmasks, the number of operations is about 10^8 which is not an issue for C++.
A 2^k algorithm for each pair uses less than 4\cdot 10^7 operations.
TIME COMPLEXITY
\mathcal{O}(2^6 \cdot N\log N) per testcase, but with fairly low constant factor.
CODE:
Author's code (C++)
#include <bits/stdc++.h>
#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/tree_policy.hpp>
using namespace std;
using namespace __gnu_pbds;
#define ll long long int
#define pb push_back
#define all(x) x.begin(),x.end()
#define Max 10000000000000000
template <typename T>
using ordered_set = tree<T, null_type, less<T>, rb_tree_tag, tree_order_statistics_node_update>;
template <typename T>
using min_heap=priority_queue<T, vector<T>, greater<T>>;
bool chk[500001];
vector<ll> pc[500001];
ll n,dp[500001],mod=1000000007;
ll bigmod(ll v,ll p){
if(!p) return 1;
if(p%2) return (v*bigmod(v,p-1))%mod;
ll temp=bigmod(v,p/2);
return (temp*temp)%mod;
}
int main()
{
//freopen("out/k.in","r",stdin);
//freopen("out/t.out","w",stdout);
scanf("%lli",&n);
for(ll i=1;i<=n;i++) pc[i].pb(1);
for(ll i=2;i<=n;i++){
if(chk[i]) continue;
for(ll j=i;j<=n;j+=i){
chk[j]=1;
ll temp=pc[j].size();
for(ll k=0;k<temp;k++){
pc[j].pb(-i*pc[j][k]);
}
}
}
for(ll i=1;i<=n;i++){
if(i!=1){
dp[i]=(dp[i]+n)%mod;
dp[i]=(dp[i]*bigmod(n-n/i,mod-2))%mod;
}
for(ll j=2*i;j<=n;j+=i){
ll d=n/i;
ll cp=0;
for(ll k=0;k<pc[j/i].size();k++){
cp+=d/pc[j/i][k];
}
//cout<<j<<" "<<n<<" "<<i<<" "<<cp<<endl;
dp[j]=(dp[j]+cp*dp[i])%mod;
}
printf("%lli ",dp[i]);
}
printf("\n");
return 0;
}
Tester's code (C++)
#include <bits/stdc++.h>
#include <ext/pb_ds/tree_policy.hpp>
#include <ext/pb_ds/assoc_container.hpp>
using namespace __gnu_pbds;
using namespace std;
#define ll long long
const ll INF_MUL=1e13;
const ll INF_ADD=2e18;
#define pb push_back
#define mp make_pair
#define nline "\n"
#define f first
#define s second
#define pll pair<ll,ll>
#define all(x) x.begin(),x.end()
#define vl vector<ll>
#define vvl vector<vector<ll>>
#define vvvl vector<vector<vector<ll>>>
#ifndef ONLINE_JUDGE
#define debug(x) cerr<<#x<<" "; _print(x); cerr<<nline;
#else
#define debug(x);
#endif
void _print(ll x){cerr<<x;}
void _print(char x){cerr<<x;}
void _print(string x){cerr<<x;}
mt19937 rng(chrono::steady_clock::now().time_since_epoch().count());
template<class T,class V> void _print(pair<T,V> p) {cerr<<"{"; _print(p.first);cerr<<","; _print(p.second);cerr<<"}";}
template<class T>void _print(vector<T> v) {cerr<<" [ "; for (T i:v){_print(i);cerr<<" ";}cerr<<"]";}
template<class T>void _print(set<T> v) {cerr<<" [ "; for (T i:v){_print(i); cerr<<" ";}cerr<<"]";}
template<class T>void _print(multiset<T> v) {cerr<< " [ "; for (T i:v){_print(i);cerr<<" ";}cerr<<"]";}
template<class T,class V>void _print(map<T, V> v) {cerr<<" [ "; for(auto i:v) {_print(i);cerr<<" ";} cerr<<"]";}
typedef tree<ll, null_type, less<ll>, rb_tree_tag, tree_order_statistics_node_update> ordered_set;
typedef tree<ll, null_type, less_equal<ll>, rb_tree_tag, tree_order_statistics_node_update> ordered_multiset;
typedef tree<pair<ll,ll>, null_type, less<pair<ll,ll>>, rb_tree_tag, tree_order_statistics_node_update> ordered_pset;
//--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
const ll MOD=1e9+7;
const ll MAX=500500;
ll binpow(ll a,ll b,ll MOD){
ll ans=1;
a%=MOD;
while(b){
if(b&1)
ans=(ans*a)%MOD;
b/=2;
a=(a*a)%MOD;
}
return ans;
}
ll inverse(ll a,ll MOD){
return binpow(a,MOD-2,MOD);
}
void solve(){
ll n; cin>>n;
vector<ll> dp(n+5,0),spf(n+5);
iota(all(spf),0);
for(ll i=2;i<=n;i++){
for(ll j=i;j<=n;j+=i){
spf[j]=min(spf[j],i);
}
}
auto getv=[&](ll n,ll p,ll g){
if(p%g){
return 0ll;
}
n/=g;
p/=g;
vector<ll> factors;
while(p!=1){
ll x=spf[p];
factors.push_back(x);
while((p%x)==0){
p/=x;
}
}
ll len=factors.size(),now=0;
for(ll mask=0;mask<(1<<len);mask++){
ll div=1;
for(ll j=0;j<len;j++){
if(mask&(1<<j)){
div*=(-factors[j]);
}
}
now+=n/div;
}
return now;
};
for(ll i=1;i<=n;i++){
debug(mp(i,dp[i]));
if(i!=1){
dp[i]+=n;
dp[i]=(dp[i]*inverse(n-getv(n,i,i),MOD))%MOD;
}
for(ll j=i+i;j<=n;j+=i){
dp[j]=(dp[j]+dp[i]*getv(n,j,i))%MOD;
}
cout<<dp[i]<<" \n"[i==n];
}
return;
}
int main()
{
ios_base::sync_with_stdio(false);
cin.tie(NULL);
#ifndef ONLINE_JUDGE
freopen("input.txt", "r", stdin);
freopen("output.txt", "w", stdout);
freopen("error.txt", "w", stderr);
#endif
ll test_cases=1;
//cin>>test_cases;
while(test_cases--){
solve();
}
cerr<<"Time:"<<1000*((double)clock())/(double)CLOCKS_PER_SEC<<"ms\n";
}
Editorialist's code (Python)
mod = 10**9 + 7
n = int(input())
masks = [ [1] for _ in range(n+1)]
invs = [pow(i, mod-2, mod) for i in range(n+1)]
for p in range(2, n+1):
if len(masks[p]) > 1: continue
for x in range(p, n+1, p):
sz = len(masks[x])
for i in range(sz): masks[x].append(-masks[x][i] * p)
def coprime(a, N): # The count of integers in [1, n] that are coprime to a
ret = 0
for x in masks[a]:
if x > 0: ret += N // x
else: ret -= N // -x
return ret
ops = 0
dp = [0]*(n+1)
for d in range(1, n+1):
if d > 1:
dp[d] += 1
dp[d] = dp[d] * invs[n - n//d] % mod * n % mod
for x in range(2*d, n+1, d):
dp[x] += coprime(x//d, n//d) * invs[n] % mod * dp[d] % mod
if dp[x] >= mod: dp[x] -= mod
print(*dp[1:])