# DIVSORT - Editorial

Author: Nishank Suresh
Testers: Utkarsh Gupta, Jatin Garg
Editorialist: Nishank Suresh

2887

# PREREQUISITES:

Factorization, Sieve of Eratosthenes, Dynamic programming, Binary search/2-pointers

# PROBLEM:

Given an array A, in one move you can divide any element of it by one of its prime factors.
Find the minimum number of moves to make the array non-decreasing.

# EXPLANATION:

A few greedy strategies might come to mind, but none of them really work.

Instead, let’s try to use dynamic programming.
Let dp_{i, x} denote the minimum number of moves to make A_1 \leq A_2 \leq \ldots \leq A_i, such that A_i = x.
Transitions are fairly easy to see:

dp_{i, x} = cost(A_i, x) + \min_{1 \leq y \leq x} dp_{i-1, y}

where cost(a, b) denotes the smallest number of moves to convert a to b when dividing by primes.

There are a couple of things to do now:

• Figure out how to compute cost(a, b)
• Optimize the dp computation to work enough.

### Computing cost

Note that cost(a, b) depends solely on the prime factorizations of a and b. In particular,

• b must be a factor of a
• Exactly one prime can be removed from a at each step
• So, if p(a) denotes the number of (not necessarily distinct) primes in the factorization of a, then cost(a, b) = p(a) - p(b).
• Alternately, you can see that we remove exactly \frac{a}{b} using primes, so this also equals p(\frac{a}{b}).

So, all that needs to be done is to compute p(n) for every 1 \leq n \leq 5\cdot 10^5.
This is a standard computation that can be done with the help of a sieve.

### Optimizing dp

There are two things to optimize: the number of states, and performing the transitions.

Note that dp_{i, x} only matters for x that is a factor of A_i since no other x can be reached via division.
So, all other states can be ignored. This gives an easy upper bound of about \mathcal{O}(N \sqrt M) states (where M = 5\cdot 10^5), but can be in fact bounded by 200\cdot N, since no number in the given range has more than 200 factors.

This also makes our transitions take \leq 200 steps each (since we only need to iterate over divisors of A_{i-1}, making the entire solution something like \mathcal{O}(N\cdot d^2) where d = 200.

Unfortunately, this is likely too slow, and we need to optimize it a bit. There are a couple ways to do this.

• One way is to use binary search along with prefix minimums. Note that dp_{i, x} depends exactly on some prefix minimum of dp_{i-1}. So, if the prefix minimums of dp_{i-1} are computed, the appropriate one can be found using binary search on the factors of A_{i-1}. This gives a solution in \mathcal{O}(N\cdot d\log d).
• Instead of binary search, one can also use two pointers to achieve \mathcal{O}(N\cdot d).

Finally, there is the issue of actually finding the factors. Once again, there are two ways to do this:

• Directly factorize each A_i in \mathcal{O}(\sqrt M).
• Or, once again use a sieve to precompute the divisors of all numbers \leq M.

The second method is faster and is what is intended, however the first method can also pass if written well.
Note that you might TLE if you use both sqrt factorization and binary search, but optimizing any one of them should be enough to get AC.

# TIME COMPLEXITY

\mathcal{O}(M\log M) precomputation followed by \mathcal{O}(N\cdot D) per test case where M = 5\cdot 10^5 and D \approx 200 is the maximum number of factors a number \leq M can have.

# CODE:

Setter's code (C++) (2-pointers)
#include "bits/stdc++.h"
// #pragma GCC optimize("O3,unroll-loops")
// #pragma GCC target("avx2,bmi,bmi2,lzcnt,popcnt")
using namespace std;
using ll = long long int;
mt19937_64 rng(chrono::high_resolution_clock::now().time_since_epoch().count());

int main()
{
ios::sync_with_stdio(false); cin.tie(0);

const int MX = 5e5 + 10;
vector<basic_string<int>> facs(MX);
vector<int> val(MX);
for (int i = 1; i < MX; ++i) for (int j = i; j < MX; j += i) {
facs[j].push_back(i);
if (j > i) val[j] = 1 + val[i];
}

int t; cin >> t;
while (t--) {
int n; cin >> n;
vector<int> a(n+1);
a[0] = 1;
for (int i = 1; i <= n; ++i) {
cin >> a[i];
}

const int SZ = 210;
array<int, SZ> dp{};
int ans = 0;
for (int i = 1; i <= n; ++i) {
array<int, SZ> curdp{};
int ptr = 0, mn = INT_MAX;
ans = INT_MAX;
auto &prv = facs[a[i-1]];
auto &cur = facs[a[i]];
for (int j = 0; j < cur.size(); ++j) {
int d = cur[j];
while (ptr < prv.size()) {
if (prv[ptr] <= d) {
mn = min(mn, dp[ptr]);
++ptr;
}
else break;
}
curdp[j] = mn + val[a[i]/d];
ans = min(ans, curdp[j]);
}
swap(dp, curdp);
}
cout << ans << '\n';
}
}

Tester (rivalq)'s code (C++) (Binary search)
// Jai Shree Ram

#include<bits/stdc++.h>
using namespace std;

#define rep(i,a,n)     for(int i=a;i<n;i++)
#define ll             long long
//#define int            long long
#define pb             push_back
#define all(v)         v.begin(),v.end()
#define endl           "\n"
#define x              first
#define y              second
#define gcd(a,b)       __gcd(a,b)
#define mem1(a)        memset(a,-1,sizeof(a))
#define mem0(a)        memset(a,0,sizeof(a))
#define sz(a)          (int)a.size()
#define pii            pair<int,int>
#define hell           1000000007
#define elasped_time   1.0 * clock() / CLOCKS_PER_SEC

template<typename T1,typename T2>istream& operator>>(istream& in,pair<T1,T2> &a){in>>a.x>>a.y;return in;}
template<typename T1,typename T2>ostream& operator<<(ostream& out,pair<T1,T2> a){out<<a.x<<" "<<a.y;return out;}
template<typename T,typename T1>T maxs(T &a,T1 b){if(b>a)a=b;return a;}
template<typename T,typename T1>T mins(T &a,T1 b){if(b<a)a=b;return a;}

// -------------------- Input Checker Start --------------------

long long readInt(long long l, long long r, char endd)
{
long long x = 0;
int cnt = 0, fi = -1;
bool is_neg = false;
while(true)
{
char g = getchar();
if(g == '-')
{
assert(fi == -1);
is_neg = true;
continue;
}
if('0' <= g && g <= '9')
{
x *= 10;
x += g - '0';
if(cnt == 0)
fi = g - '0';
cnt++;
assert(fi != 0 || cnt == 1);
assert(fi != 0 || is_neg == false);
assert(!(cnt > 19 || (cnt == 19 && fi > 1)));
}
else if(g == endd)
{
if(is_neg)
x = -x;
if(!(l <= x && x <= r))
{
cerr << l << ' ' << r << ' ' << x << '\n';
assert(false);
}
return x;
}
else
{
assert(false);
}
}
}

string readString(int l, int r, char endd)
{
string ret = "";
int cnt = 0;
while(true)
{
char g = getchar();
assert(g != -1);
if(g == endd)
break;
cnt++;
ret += g;
}
assert(l <= cnt && cnt <= r);
return ret;
}

long long readIntSp(long long l, long long r) { return readInt(l, r, ' '); }
long long readIntLn(long long l, long long r) { return readInt(l, r, '\n'); }
string readStringSp(int l, int r) { return readString(l, r, ' '); }
void readEOF() { assert(getchar() == EOF); }

vector<int> readVectorInt(int n, long long l, long long r)
{
vector<int> a(n);
for(int i = 0; i < n - 1; i++)
a[n - 1] = readIntLn(l, r);
return a;
}

// -------------------- Input Checker End --------------------

#define SIEVE

const int N = 1000050;

int lp[N+1];
int pr[N];int t=0;

void sieve(){
for (int i=2; i<N; ++i) {
if (lp[i] == 0) {
lp[i] = i;
pr[t++]=i;
}
for (int j=0; j<t && pr[j]<=lp[i] && i*pr[j]<N; ++j)
lp[i * pr[j]] = pr[j];
}
}

int cnt[N];
vector<int> d[N];

int solve(){
static int sum_n = 0;
sum_n += n;
assert(sum_n <= 2e5);

int ans = 1e9;

vector<pii> dp;
for(auto j:d[a[n - 1]]){
dp.push_back({j,cnt[a[n - 1]/j]});
}

for(int i = n - 2; i >= 0; i--){
vector<pii> f;
int m = dp.size();
for(int i = m - 2; i >= 0; i--)dp[i].y = min(dp[i].y,dp[i + 1].y);
for(auto k:d[a[i]]){
int mn = 1e9;
auto itr = lower_bound(all(dp),make_pair(k,0));
if(itr != dp.end()){
f.push_back({k,cnt[a[i]/k] + itr->y});
}
}
swap(f,dp);

}
for(auto i:dp)mins(ans,i.y);
cout << ans << endl;

return 0;
}
signed main(){
ios_base::sync_with_stdio(0);cin.tie(0);cout.tie(0);
//freopen("input.txt", "r", stdin);
//freopen("output.txt", "w", stdout);
#ifdef SIEVE
sieve();
#endif
for(int i = 2; i < N; i++){
cnt[i] = 1 + cnt[i / lp[i]];
}
for(int i = 1; i <= 5e5; i++){
for(int j = i; j <= 5e5; j += i){
d[j].push_back(i);
}
}
#ifdef NCR
init();
#endif
while(t--){
solve();
}
return 0;
}

Tester (utkarsh_25dec)'s code (C++) (2-pointers)
//Utkarsh.25dec
#include <bits/stdc++.h>
#define ll long long int
#define pb push_back
#define mp make_pair
#define mod 1000000007
#define vl vector <ll>
#define all(c) (c).begin(),(c).end()
using namespace std;
ll power(ll a,ll b) {ll res=1;a%=mod; assert(b>=0); for(;b;b>>=1){if(b&1)res=res*a%mod;a=a*a%mod;}return res;}
ll modInverse(ll a){return power(a,mod-2);}
const int N=500023;
bool vis[N];
long long readInt(long long l,long long r,char endd){
long long x=0;
int cnt=0;
int fi=-1;
bool is_neg=false;
while(true){
char g=getchar();
if(g=='-'){
assert(fi==-1);
is_neg=true;
continue;
}
if('0'<=g && g<='9'){
x*=10;
x+=g-'0';
if(cnt==0){
fi=g-'0';
}
cnt++;
assert(fi!=0 || cnt==1);
assert(fi!=0 || is_neg==false);

assert(!(cnt>19 || ( cnt==19 && fi>1) ));
} else if(g==endd){
if(is_neg){
x= -x;
}

if(!(l <= x && x <= r))
{
cerr << l << ' ' << r << ' ' << x << '\n';
assert(1 == 0);
}

return x;
} else {
assert(false);
}
}
}
string ret="";
int cnt=0;
while(true){
char g=getchar();
assert(g!=-1);
if(g==endd){
break;
}
cnt++;
ret+=g;
}
assert(l<=cnt && cnt<=r);
return ret;
}
long long readIntSp(long long l,long long r){
}
long long readIntLn(long long l,long long r){
}
}
}
vector <int> to[N];
int isprime[N];
int cntprimes[N];
void solve()
{
int A[n+1]={0};
for(int i=1;i<=n;i++)
{
if(i==n)
else
}
A[0]=1;
int dp[n+1][205];
memset(dp,0,sizeof(dp));
for(int i=1;i<=n;i++)
{
int k=0;
for(int j=0;j<to[A[i]].size();j++)
{
while(k<to[A[i-1]].size())
{
if(to[A[i-1]][k]>to[A[i]][j])
break;
k++;
}
k--;
dp[i][j]=dp[i-1][k]+cntprimes[A[i]/to[A[i]][j]];
if(j>0)
dp[i][j]=min(dp[i][j],dp[i][j-1]);
}
}
cout<<dp[n][to[A[n]].size()-1]<<'\n';
}
int main()
{
#ifndef ONLINE_JUDGE
freopen("input.txt", "r", stdin);
freopen("output.txt", "w", stdout);
#endif
ios_base::sync_with_stdio(false);
cin.tie(NULL),cout.tie(NULL);
for(int i=1;i<N;i++)
isprime[i]=i;
for(int i=2;i<N;i++)
{
if(isprime[i]!=i)
continue;
for(int j=i;j<N;j+=i)
isprime[j]=i;
}
cntprimes[1]=0;
for(int i=2;i<N;i++)
cntprimes[i]=cntprimes[i/isprime[i]]+1;
for(int i=1;i<N;i++)
{
for(int j=i;j<N;j+=i)
to[j].pb(i);
}
while(T--)
solve();
assert(getchar()==-1);
cerr << "Time : " << 1000 * ((double)clock()) / (double)CLOCKS_PER_SEC << "ms\n";
}

1 Like

Isn’t the time limit too strict for this problem? I did the recursive implementation of the same approach in O(N*d) per test case, but still, it’s giving tle
My solution : Solution: 74215854 | CodeChef

Haven’t looked too deeply into your code, but I can say that the TL was somewhat lenient.

My code took ~0.55s, and during testing we found out that both \mathcal{O}(Nd\log d) using binary search and \mathcal{O}(N\sqrt {5\times 10^5}) that brute-factorized each number got AC too (in something like 1 second and 1.5 seconds respectively).

Unfortunately lowering the limits much more or increasing TL would have led to constant-factor optimized \mathcal{O}(Nd^2) passing which I wanted to avoid.