PRSUMARRHD - Editorial

PROBLEM LINK:

Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4

Author: who
Tester: iceknight1093
Editorialist: iceknight1093

DIFFICULTY:

Easy

PREREQUISITES:

Integer factorization, dynamic programming

PROBLEM:

Given N, let R denote the number of arrays A such that:

  • 1 \le A_1 \le A_2 \le \ldots \le A_{|A|} \le N
  • A_1 + A_2 + \ldots + A_{|A|} = N
  • A_1 \times A_2 \times \ldots \times A_{|A|} = N

Compute R.

EXPLANATION:

As seen in the solution to the easy version, the value R simply equals the number of (unordered) ways of writing N as a product of some of its factors, where each factor is \gt 1.

Let’s try building up the product from small values to larger ones.

That is, suppose the factors of N are d_1, d_2, \ldots, d_k.
Then, we can try to use some copies of d_1, followed by some copies of d_2, and so on till we reach a product of exactly N.
In particular, note that at every step, our product must itself be a factor of N.

This allows us to use dynamic programming, as follows.
Define dp(i, x) to be the number of ways of obtaining a product of exactly x, using only the elements d_1, d_2, \ldots, d_i.

The transitions here are fairly simple:
dp(i, x) = dp(i-1, x) + dp(i, \frac{x}{d_i})
This is because we either don’t use any copies of d_i (in which case we’re left with the first i-1 factors), or we do use a copy of d_i, in which case we want to then make \frac{x}{d_i} from the first i factors.

The answer is simply dp(k, N).

Note that the complexity of this dynamic programming is just \mathcal{O}(k^2), where k denotes the number of divisors of N.
That’s because for the state (i, x), i is an integer from 1 to k, and x must be a factor of N (of which there are k). Transitions from each state are \mathcal{O}(1).

Since we’re dealing with N \le 10^9, numbers cannot actually have that many divisors.
In particular, the maximum possible value of k is 1344, occurring at N = 735134400.
So, \mathcal{O}(k^2) is perfectly acceptable as a complexity, especially because there are only 20 test cases in a given file.


One thing to be careful about is that the divisors of N can be quite large: large enough that they cannot be indexed using a normal array.

So, we need an additional data structure such as a map to store dp(i, x), which contributes to the complexity by itself: using std::map in C++ would give an extra \mathcal{O}(\log k) to the complexity, for example.
Alternately, the factors of N can be coordinate-compressed so that a normal array works, and then use binary search to find the appropriate index to be looked up when performing transitions - there’s still an extra \mathcal{O}(\log k) factor but with lower constant factor.

It’s possible to also eliminate the extra log factor entirely by precomputing transitions, though this was not needed to get AC.

TIME COMPLEXITY:

\mathcal{O}(\sqrt N + \text{divs}(N)^2) per testcase.

CODE:

Author's code (C++)
#include <bits/stdc++.h>
using namespace std;
#define int long long
#define INF (int)1e18

mt19937_64 RNG(chrono::steady_clock::now().time_since_epoch().count());

void Solve() 
{
    int n; cin >> n;
    
    vector <int> d;
    for (int i = 1; i * i <= n; i++){
        if (n % i == 0){
            d.push_back(i);
            if (i * i != n){
                d.push_back(n / i);
            }
        }
    }
    
    sort(d.begin(), d.end());
    int m = d.size();
    
    map <int, int> dp;
    dp[1] = 1;
    
    for (auto x : d){
        if (x == 1) continue;
        for (auto y : d) if (y % x == 0){
            dp[y] += dp[y / x];
        }
    }
    
    cout << dp[n] << "\n";
}

int32_t main() 
{
    auto begin = std::chrono::high_resolution_clock::now();
    ios_base::sync_with_stdio(0);
    cin.tie(0);
    int t = 1;
    // freopen("in",  "r", stdin);
    // freopen("out", "w", stdout);
    
    cin >> t;
    for(int i = 1; i <= t; i++) 
    {
        //cout << "Case #" << i << ": ";
        Solve();
    }
    auto end = std::chrono::high_resolution_clock::now();
    auto elapsed = std::chrono::duration_cast<std::chrono::nanoseconds>(end - begin);
    cerr << "Time measured: " << elapsed.count() * 1e-9 << " seconds.\n"; 
    return 0;
}
Editorialist's code (C++, no log)
// #include <bits/allocator.h>
// #pragma GCC optimize("O3,unroll-loops")
// #pragma GCC target("avx2,bmi,bmi2,lzcnt,popcnt")
#include "bits/stdc++.h"
using namespace std;
using ll = long long int;
mt19937_64 RNG(chrono::high_resolution_clock::now().time_since_epoch().count());

const int SZ = 2000;
int go[SZ][SZ];
ll dp[SZ][SZ];

int main()
{
    ios::sync_with_stdio(false); cin.tie(0);
    
    int t; cin >> t;
    while (t--) {
        int n; cin >> n;

        vector<int> factors = {1};
        for (int i = 2; i*i <= n; ++i) {
            if (n%i == 0) {
                factors.push_back(i);
                if (i != n/i) factors.push_back(n/i);
            }
        }
        if (n > 1) factors.push_back(n);
        ranges::sort(factors);

        int k = size(factors);
        for (int i = 0; i < k; ++i) {
            int ptr = 0;
            for (int j = i; j >= 0; --j) {
                while (factors[ptr] < factors[i]/factors[j]) ++ptr;
                if (factors[ptr] * 1ll * factors[j] == factors[i]) go[i][j] = ptr;
                else go[i][j] = -1;
            }
        }
        
        dp[0][0] = 1;
        for (int i = 1; i < k; ++i) {
            for (int j = 0; j < k; ++j) {
                dp[i][j] = dp[i-1][j];
                if (i <= j and go[j][i] != -1) dp[i][j] += dp[i][go[j][i]];
            }
        }
        cout << dp[k-1][k-1] << '\n';
    }
}