GCDPAR - Editorial

PROBLEM LINK:

Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4

Author: rudra_1232
Tester: kingmessi
Editorialist: iceknight1093

DIFFICULTY:

Easy-Med

PREREQUISITES:

Dynamic programming, finding subarray GCDs

PROBLEM:

A valid split of an array is a partition of its elements into contiguous non-empty subarrays, such that the i-th subarray from the left has a GCD of exactly i.
For an array B, F(B) denotes the number of its valid splits.

You’re given an array A. Compute \sum_{i=1}^N\sum_{j=i}^N F(A[i, j]), modulo 10^9 + 7.

EXPLANATION:

We relax the definition of a valid split a bit, and just say that a partition into subarrays is valid if the GCDs of the subarrays are increasing by 1 from left to right.
So for example if the subarray GCDs are 10, 11, 12, \ldots that’s still valid; though in the end we’re only interested in those that start with 1.

First, let’s ignore the “sum over subarrays” part, and just try to count the number of valid splits starting with 1 for a single array.
One way to do this is using dynamic programming.

Let dp[i][x] denote the number of valid splits of the suffix [A_i, A_{i+1}, \ldots, A_N], such that the first subarray has a GCD of x.
This is rather easy to compute: for each j\geq i such that \gcd(A_i, A_{i+1}, \ldots, A_j) = x, add dp[j+1][x+1] to the answer - essentially, fix the first subarray with a GCD of x, and then see that the remainder is simply obtained by any valid split of the suffix starting from j+1.

This DP has \mathcal{O}(N^2) states and \mathcal{O}(N) transitions from each of them, which is of course very slow.

To optimize it, we can use the (well-known) observation that there are only \mathcal{O}(\log A_i) distinct subarray GCDs starting at index i, and for each of these GCDs, the possible right endpoints will form a contiguous range. This is because when you look at the GCD starting at index i and moving right, if the GCD ever changes it must move to a strict factor of its current value - meaning it’ll at least halve.

This immediately reduces the number of states in the DP to \mathcal{O}(N\log M) where M = \max(A) - after all, dp[i][x] can be non-zero if and only if there’s a subarray with GCD x starting at i in the first place.

Next, for the transitions, note that to compute dp[i][x], we essentially compute the sum of dp[j+1][x+1] across some range of j.
This can be optimized using prefix sums (though a bit of careful implementation is needed - to preserve the low number of states, the prefix sums also have to be built on a sparse array, so you might need to binary search to find the appropriate range to sum).

The final answer is then dp[1][1], and we’ve found it in \mathcal{O}(N\log M) time (maybe with an additional \log N factor depending on implementation).


Now, we need to think about adapting this algorithm to compute the answer across all subarrays.

One way to do that is to modify what the DP stores a bit.
Let’s redefine dp[i][x] to be: the sum of the number of valid splits starting at i with starting value x, across all subarrays starting at i.

That is, let f(i, j, x) be a function denoting the number of valid splits of [A_i, A_{i+1}, \ldots, A_j] whose first segment has GCD x.
Then, dp[i][x] will hold the sum f(i, i, x) + f(i, i+1, x) + \cdots + f(i, N, x).

Note that the number of states is exactly the same as the previous DP, at \mathcal{O}(N\log M).
So, if we can process transitions quickly enough, we’ll be done: after all, the answer we’re looking for is exactly \displaystyle\sum_{i=1}^N dp[i][1].

So, let’s look at the transitions for dp[i][x] under the new definition.
Let [L, R] be the range of indices for which L \leq j \leq R \implies \gcd(A_i, \ldots, A_j) = x.
If we fix the index j chosen in this range at the endpoint of the first segment,

  • Subarrays ending before j don’t need to be considered, obviously.
  • We have a valid split for the subarray [A_i, A_{i+1}, \ldots, A_j], so add 1 to the answer.
  • For a subarray ending after j, any valid split can be uniquely obtained as a result of choosing a split starting from j+1 with value x+1, so we add dp[j+1][x+1] to the answer.

Notice that this is remarkably similar to the first set of transitions: we add 1 for each j (for a total of (R-L+1)), but apart from that it’s just a sum of dp[j+1][x+1] across some range of j.
This, as done previously, can be computed quickly by maintaining prefix (or suffix) sums.

This allows us to compute all the dp[i][x] values in \mathcal{O}(N\log M) or \mathcal{O}(N\log M\log N) time, which is fast enough.

TIME COMPLEXITY:

\mathcal{O}(N\log M\log N) per testcase, where M = \max(A) \leq 10^9.

CODE:

Author's code (C++)
#include"bits/stdc++.h" 
using namespace std; 
#ifdef ONLINE_JUDGE
#define debug(...) 42
#else
#include "puppet.h"
#endif
#include<ext/pb_ds/assoc_container.hpp>
#include<ext/pb_ds/tree_policy.hpp>
using namespace __gnu_pbds;
typedef long double ld;
typedef long long ll;
typedef unsigned long long ull;
#define pb push_back
#define pob pop_back
#define bplc __builtin_popcountll 
#define bpc __builtin_popcount
#pragma GCC optimize("Ofast")
#define F first
#define S second
#define all(v) v.begin(),v.end()
#define unique(v) (v).erase(unique((v).begin(), (v).end()), (v).end())

template<class T> using pbds= tree<T, null_type, less<T>,rb_tree_tag, tree_order_statistics_node_update>;
template<class T> using min_heap=priority_queue<T, vector<T>, greater<T>>;
// *a.find_by_order(x)-->gives xth element considering 0 indexing
// a.order_of_key(x)-->gives number of elements strictly smaller than x


// struct custom_hash {
//     static uint64_t splitmix64(uint64_t x) {
//         // http://xorshift.di.unimi.it/splitmix64.c
//         x += 0x9e3779b97f4a7c15;
//         x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9;
//         x = (x ^ (x >> 27)) * 0x94d049bb133111eb;
//         return x ^ (x >> 31);
//     }
 
//     size_t operator()(uint64_t x) const {
//         static const uint64_t FIXED_RANDOM = chrono::steady_clock::now().time_since_epoch().count();
//         return splitmix64(x + FIXED_RANDOM);
//     }
// };



const int N=2e5+5;
const int MM=20;
int sparse[N][MM];
ll M=1e9+7;
ll a[N],m[N+200],way[N+200];
vector<pair<ll,ll>>store[N+1],wstore[N+1];
int st;

void solve(){


int n;
cin>>n;
// debug(n);
// int a[n];
ll tot=0,tt=0;
st+=n;
for(int i=0;i<n;i++){
    cin>>a[i];
    // debug(a[i]);
    store[i].clear();
    wstore[i].clear();
sparse[i][0]=a[i];
}
store[n].clear();
for(int i=0;i<=n+100;i++){m[i]=0;way[i]=0;}


int lg=__lg(n);
        for(int j=1;j<=lg;j++){ 
for(int i=0;i<n;i++){
sparse[i][j]=__gcd(sparse[i][j-1],sparse[min(n,i+(1<<(j-1)))][j-1]);
}
}

auto qre=[&](int l,int r){
int k=__lg(r-l+1);
return __gcd(sparse[l][k],sparse[r-(1<<(k))+1][k]);
};





m[0]=1;
way[0]=1;
for(int i=0;i<n;i++){

int l=i,r=n;
int x=a[i];

while(l!=n){

    if(x-1<=n&&m[x-1]){
        wstore[l].pb({way[x-1],x});
        store[l].pb({x,1});
    }

bool ok=false;
r=n;


while(l<r){
int mid=(l+r)>>1;
    if(qre(i,mid)==x){
    l=mid+1;
    }
    else {
        ok=true;
        r=mid;
    }
}

if(!ok)l=n;

if(x-1<=n&&m[x-1]){
    wstore[l].pb({-way[x-1],x});
    store[l].pb({x,-1});
}

  if(l!=n)
x=qre(i,l);

}

for(auto p:store[i]){

    // debug(p,i);
m[p.F]+=p.S;
}

for(auto p:wstore[i]){
  
    tot=(tot-way[p.S])%M;
    way[p.S]=(way[p.S]+p.F)%M;
    tot=(tot+way[p.S])%M;
}

tt=(tot+tt)%M;
// debug(store[i]);

}
ll ans=0;
// debug(store[n]);
// debug(ss);
for(auto p:store[n])ans=max(ans,p.F);

cout<<(tt%M+M)%M<<"\n";

}


int main(){ 



 ios::sync_with_stdio(0);
  cin.tie(0);


 clock_t tStart = clock();
 int T=1;
cin>>T;
while(T--){
    solve();
}
debug(st);
#ifndef ONLINE_JUDGE
        // cout<<"Time taken: "<<setprecision(10)<<((double)(clock()-tStart)/CLOCKS_PER_SEC)<<" s"<<endl;
    #endif




    return 0; 
} 
Tester's code (C++)
//Har Har Mahadev
#include<bits/stdc++.h>
#include <ext/pb_ds/assoc_container.hpp> // Common file
#include <ext/pb_ds/tree_policy.hpp>
#define ll long long
// #define int long long
#define rep(i,a,b) for(int i=a;i<b;i++)
#define rrep(i,a,b) for(int i=a;i>=b;i--)
#define repin rep(i,0,n)
#define precise(i) cout<<fixed<<setprecision(i)
#define vi vector<int>
#define si set<int>
#define mii map<int,int>
#define take(a,n) for(int j=0;j<n;j++) cin>>a[j];
#define give(a,n) for(int j=0;j<n;j++) cout<<a[j]<<' ';
#define vpii vector<pair<int,int>>
#define db double
#define be(x) x.begin(),x.end()
#define pii pair<int,int>
#define pb push_back
#define pob pop_back
#define ff first
#define ss second
#define lb lower_bound
#define ub upper_bound
#define bpc(x) __builtin_popcountll(x) 
#define btz(x) __builtin_ctz(x)
using namespace std;

using namespace __gnu_pbds;

typedef tree<int, null_type, less<int>, rb_tree_tag,tree_order_statistics_node_update> ordered_set;
typedef tree<pair<int, int>, null_type,less<pair<int, int> >, rb_tree_tag,tree_order_statistics_node_update> ordered_multiset;

const long long INF=1e18;
const long long M=1e9+7;
const long long MM=998244353;
  


struct input_checker {
    string buffer;
    int pos;

    const string all = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
    const string number = "0123456789";
    const string upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
    const string lower = "abcdefghijklmnopqrstuvwxyz";

    input_checker() {
        pos = 0;
        while (true) {
            int c = cin.get();
            if (c == -1) {
                break;
            }
            buffer.push_back((char) c);
        }
    }

    int nextDelimiter() {
        int now = pos;
        while (now < (int) buffer.size() && !isspace(buffer[now])) {
            now++;
        }
        return now;
    }

    string readOne() {
        assert(pos < (int) buffer.size());
        int nxt = nextDelimiter();
        string res;
        while (pos < nxt) {
            res += buffer[pos];
            pos++;
        }
        return res;
    }

    string readString(int minl, int maxl, const string &pattern = "") {
        assert(minl <= maxl);
        string res = readOne();
        assert(minl <= (int) res.size());
        assert((int) res.size() <= maxl);
        for (int i = 0; i < (int) res.size(); i++) {
            assert(pattern.empty() || pattern.find(res[i]) != string::npos);
        }
        return res;
    }

    int readInt(int minv, int maxv) {
        assert(minv <= maxv);
        int res = stoi(readOne());
        assert(minv <= res);
        assert(res <= maxv);
        return res;
    }

    long long readLong(long long minv, long long maxv) {
        assert(minv <= maxv);
        long long res = stoll(readOne());
        assert(minv <= res);
        assert(res <= maxv);
        return res;
    }

    auto readInts(int n, int minv, int maxv) {
        assert(n >= 0);
        vector<int> v(n);
        for (int i = 0; i < n; ++i) {
            v[i] = readInt(minv, maxv);
            if (i+1 < n) readSpace();
        }
        return v;
    }

    auto readLongs(int n, long long minv, long long maxv) {
        assert(n >= 0);
        vector<long long> v(n);
        for (int i = 0; i < n; ++i) {
            v[i] = readLong(minv, maxv);
            if (i+1 < n) readSpace();
        }
        return v;
    }

    void readSpace() {
        assert((int) buffer.size() > pos);
        assert(buffer[pos] == ' ');
        pos++;
    }

    void readEoln() {
        assert((int) buffer.size() > pos);
        assert(buffer[pos] == '\n');
        pos++;
    }

    void readEof() {
        assert((int) buffer.size() == pos);
    }
}inp;

template <class S, S (*op)(S, S), S (*e)()> struct segtree {
  public:
    segtree() : segtree(0) {}
    explicit segtree(int n) : segtree(std::vector<S>(n, e())) {}
    explicit segtree(const std::vector<S>& v) : _n(int(v.size())) {
        size = 1;while(size < _n)size *= 2;
        log = __builtin_ctz(size);
        d = std::vector<S>(2 * size, e());
        for (int i = 0; i < _n; i++) d[size + i] = v[i];
        for (int i = size - 1; i >= 1; i--) {
            update(i);
        }
    }

    void set(int p, S x) {
        assert(0 <= p && p < _n);
        p += size;
        d[p] = x;
        for (int i = 1; i <= log; i++) update(p >> i);
    }

    S get(int p) const {
        assert(0 <= p && p < _n);
        return d[p + size];
    }

    S prod(int l, int r) const {
        assert(0 <= l && l <= r && r <= _n);
        S sml = e(), smr = e();
        l += size;
        r += size;

        while (l < r) {
            if (l & 1) sml = op(sml, d[l++]);
            if (r & 1) smr = op(d[--r], smr);
            l >>= 1;
            r >>= 1;
        }
        return op(sml, smr);
    }

    S all_prod() const { return d[1]; }

  private:
    int _n, size, log;
    std::vector<S> d;
    void update(int k) { d[k] = op(d[2 * k], d[2 * k + 1]); }
};


int op(int l,int r){
	return (l+r)%M;
}

int e(){
	return 0;
}

int smn = 0;
 
void solve()
{
    int n;
    // cin >> n;
    smn += n;
    n = inp.readInt(1,200'000);
    inp.readEoln();
    vi a(n);
    // take(a,n);
    repin{
    	a[i] = inp.readInt(1,1000'000'000);
    	if(i == n-1)inp.readEoln();
    	else inp.readSpace();
    }
    segtree<int,op,e> st(n);
    vector<array<int,3>> qr[n+1];//l,r,i]; gcd(l..i) to gcd(r..i) are equal to num in qr[num]
    vector<array<int,2>> ch;//[ele,pos]
    repin{
    	vector<array<int,2>> tch;
    	int sz = ch.size();
    	rep(j,0,sz){
    		if((j+1 < sz ? __gcd(ch[j+1][0],a[i]) : a[i]) == __gcd(ch[j][0],a[i])){continue;}
    		tch.pb({__gcd(ch[j][0],a[i]),ch[j][1]});
    	}
    	tch.pb({a[i],i});
    	ch = tch;
    	sz = ch.size();
    	rep(j,0,ch.size()){
    		if(ch[j][0] <= n)
    		qr[ch[j][0]].pb({(j ? ch[j-1][1]+1 : 0),ch[j][1],i});
    	}
    }


    if(qr[1].size() == 0){
    	cout << 0 << "\n";
    	return;
    }

    for(auto& [l,r,i] : qr[1])st.set(i,r-l+1);
   	int ans = st.all_prod();
   	// cout << ans << "hi\n";
    rep(e,2,n+1){
    	if(qr[e].size() == 0)break;
    	vi v;
    	for(auto& [l,r,i] : qr[e]){
    		v.pb(st.prod(max(l-1,0),r));
    	}
    	for(auto& [l,r,i] : qr[e-1]){
    		st.set(i,0);
    	}
    	reverse(be(v));
    	for(auto& [l,r,i] : qr[e]){
    		st.set(i,v.back());
    		v.pob();
    	}
    	ans += st.all_prod();
    	ans %= M;
    }
    ans += M;ans %= M;
    cout << ans << "\n";


}

signed main(){
    ios_base::sync_with_stdio(0);
    cin.tie(0);
    cout.tie(0);
    #ifdef NCR
        init();
    #endif
    #ifdef SIEVE
        sieve();
    #endif
    int t;
    // cin >> t;
    t = inp.readInt(1,200'000);
    inp.readEoln();
    while(t--)
        solve();
    inp.readEof();
    assert(smn <= 200'000);
    return 0;
}
Editorialist's code (C++)
// #include <bits/allocator.h>
// #pragma GCC optimize("O3,unroll-loops")
// #pragma GCC target("avx2,bmi,bmi2,lzcnt,popcnt")
#include "bits/stdc++.h"
using namespace std;
using ll = long long int;
mt19937_64 RNG(chrono::high_resolution_clock::now().time_since_epoch().count());

const int mod = 1e9 + 7;

int main()
{
    ios::sync_with_stdio(false); cin.tie(0);
    
    int t; cin >> t;
    while (t--) {
        int n; cin >> n;
        vector<int> a(n);
        for (int &x : a) cin >> x;

        vector<array<int, 2>> gcds;
        vector<vector<pair<int, int>>> positions(n+2);
        int ans = 0;

        for (int i = n-1; i >= 0; --i) {
            vector<array<int, 2>> cur_gcds = {{a[i], i}};
            for (auto [x, y] : gcds) {
                int g = gcd(x, a[i]);
                if (cur_gcds.back()[0] == g) cur_gcds.back()[1] = y;
                else cur_gcds.push_back({g, y});
            }
            gcds = cur_gcds;

            int p = i-1;
            for (auto [x, y] : gcds) {
                // [p+1, y]
                if (x <= n) {
                    int cur = y - p;
                    if (positions[x+1].size()) {
                        // Find last one that's >= p+2
                        // Subtract last one that's >= y+2
                        auto last_till = [&] (int pos) {
                            auto it = upper_bound(begin(positions[x+1]), end(positions[x+1]), pos, [&] (int j, auto ele) {
                                return ele.first < j;
                            });
                            return prev(it)->second;
                        };

                        cur += last_till(p+2) - last_till(y+2);
                        cur = (cur%mod + mod) % mod;
                    }
                    
                    if (x == 1) ans = (ans + cur) % mod;
                    positions[x].push_back({i, cur});
                    if (positions[x].size() > 1) positions[x].back().second = (positions[x][positions[x].size()-2].second + positions[x].back().second) % mod;
                }
                p = y;
            }
        }
        cout << ans << '\n';
    }
}
1 Like

Nice problem, it’s a shame I could only solve the first part before running out of time.

1 Like

I tried editorialist’s(@iceknight1093 ) code on array - 1 16 6, it gives some garbage value on codeforces custom invocation, I guess it might be some null pointer-