ADVITIYA7 - Editorial


Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4

Author: adj_alt
Tester: apoorv_me
Editorialist: iceknight1093




Elementary number theory


Given a number X, answer Q queries of the following form:

  • Given P, find the number of Y such that \gcd(X, Y)^P = \text{lcm}(X, Y)


To solve this task, it’s helpful to look at what the condition on gcd and lcm means in terms of the prime factorizations of X and Y.
Let’s prime factorize X and Y, so that

X = p_1 ^ {a_1} p_2 ^ {a_2} \ldots p_k ^ {a_k} \\ Y = p_1 ^ {b_1} p_2 ^ {b_2} \ldots p_k ^ {b_k}

Note that we use the same set of primes for both numbers for the sake of convenience (meaning some of the a_i and/or b_i are allowed to be 0).
Then, it’s well-known that

\gcd(X, Y) = p_1 ^ {\min(a_1, b_1)} p_2 ^ {\min(a_2, b_2)} \ldots p_k ^ {\min(a_k, b_k)} \\ \text{lcm}(X, Y) = p_1 ^ {\max(a_1, b_1)} p_2 ^ {\max(a_2, b_2)} \ldots p_k ^ {\max(a_k, b_k)}

So, \gcd(X, Y)^P = \text{lcm}(X, Y) means that for each prime p_i, we want \max(a_i, b_i) = P\cdot \min(a_i, b_i).
In particular, this means that either b_i = P\cdot a_i, or b_i = \frac{a_i}{P}.

Now, all the a_i values are fixed, and can be found by just prime factorizing X (even the straightforward \mathcal{O}(\sqrt{X}) algorithm is fast enough here).
Further, Y is defined uniquely by the choices we make for each b_i (after all, a positive integer is determined uniquely by its prime factorization).
Observe that:

  • b_i = P\cdot a_i is always an available choice.
  • b_i = \frac{a_i}{P} is available if and only if a_i is a multiple of P.

So, suppose there are M values of a_i that are multiples of P.
The answer is 2^M, since only for those primes we have two options, and everything else is fixed.
There is one edge case: for P = 1, P\cdot a_i = \frac{a_i}{P}, so we only have one option for each prime (meaning the answer is 1).

Thus, for each query, all that needs to be done is to count the number of a_i that are multiples of P.
Prime factorize X once and store all the a_i; then observe that there can only be \mathcal{O}(\log X) values of a_i anyway (since X can’t have more than \log_2 X distinct prime factors), so counting the number of them that are multiples of P can be done by just iterating over them all.

Finally, since the answer is 2^M for some M \leq \log_2 X, the answer fits into a 32-bit integer easily, and the modulo is irrelevant!


\mathcal{O}(\sqrt{X} + Q\log X) per testcase.


Author's code (C++)
#include <bits/stdc++.h>
using namespace std;
// #include <ext/pb_ds/trie_policy.hpp>
// #include <ext/pb_ds/assoc_container.hpp>
// using namespace __gnu_pbds;
// template<typename T>
// using oset = tree<T, null_type, less_equal<T>, rb_tree_tag, tree_order_statistics_node_update>;
void actualcode();
#define ll long long int                          
#define endl "\n"
#pragma GCC optimize("unroll-loops")
#pragma GCC optimize("Ofast")
#pragma GCC target("fma,sse,sse2,sse3,ssse3,sse4,popcnt,abm,mmx,avx,avx2,tune=native")
int mod = 1e9 + 7;
ll max3(ll a, ll b, ll c) { return (max(max(a, b), c)); }
ll max4(ll a, ll b, ll c, ll d) { return (max(max(max(a, b), c), d)); }
ll gcd(ll a, ll b) { return b == 0 ? a : gcd(b, a % b); }
void YES() { cout << "YES" << endl; }
void NO() { cout << "NO" << endl; }
void yes() { cout << "Yes" << endl; }
void no() { cout << "No" << endl; }

int main()
    // #ifndef ONLINE_JUDGE
    //     freopen("Test_Case_3.txt","r",stdin);
    //     freopen("output8meh.txt","w",stdout);
    // #endif
int sieve[40005];
void Sieve()
    for (int i = 0; i < 40005; i++)
        sieve[i] = 1;
    for (ll i = 2; i < 40005; i++)
        if (sieve[i] == 1)
            for (ll j = i * i; j < 40005; j = j + i)
                sieve[j] = 0;
void actualcode()
    int t = 1;
    cin >> t;
    ll zero=0;
    vector<int> prs;
    for(int i=2;i<40005;i++)
        if(sieve[i]) prs.push_back(i);
    while (t--)
        long long x;
        int q;
        ll temp;
        int p;
        int cnt=0,cnt1=0;
        ll ans;
        vector<int> pws;
        for(auto it:prs)
                for(auto it:pws)

Tester's code (C++)
#ifndef LOCAL
#pragma GCC optimize("O3,unroll-loops")
#pragma GCC target("avx,avx2,sse,sse2,sse3,sse4,popcnt,fma")

#include <bits/stdc++.h>
using namespace std;

#ifdef LOCAL
#include "../debug.h"
#define dbg(...) "11-111"

struct input_checker {
    string buffer;
    int pos;

    const string all = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
    const string number = "0123456789";
    const string upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
    const string lower = "abcdefghijklmnopqrstuvwxyz";

    input_checker() {
        pos = 0;
        while (true) {
            int c = cin.get();
            if (c == -1) {
            buffer.push_back((char) c);

    int nextDelimiter() {
        int now = pos;
        while (now < (int) buffer.size() && !isspace(buffer[now])) {
        return now;

    string readOne() {
        assert(pos < (int) buffer.size());
        int nxt = nextDelimiter();
        string res;
        while (pos < nxt) {
            res += buffer[pos];
        return res;

    string readString(int minl, int maxl, const string &pattern = "") {
        assert(minl <= maxl);
        string res = readOne();
        assert(minl <= (int) res.size());
        assert((int) res.size() <= maxl);
        for (int i = 0; i < (int) res.size(); i++) {
            assert(pattern.empty() || pattern.find(res[i]) != string::npos);
        return res;

    int readInt(int minv, int maxv) {
        assert(minv <= maxv);
        int res = stoi(readOne());
        assert(minv <= res);
        assert(res <= maxv);
        return res;

    long long readLong(long long minv, long long maxv) {
        assert(minv <= maxv);
        long long res = stoll(readOne());
        assert(minv <= res);
        assert(res <= maxv);
        return res;

    auto readInts(int n, int minv, int maxv) {
        assert(n >= 0);
        vector<int> v(n);
        for (int i = 0; i < n; ++i) {
            v[i] = readInt(minv, maxv);
            if (i+1 < n) readSpace();
        return v;

    auto readLongs(int n, long long minv, long long maxv) {
        assert(n >= 0);
        vector<long long> v(n);
        for (int i = 0; i < n; ++i) {
            v[i] = readLong(minv, maxv);
            if (i+1 < n) readSpace();
        return v;

    void readSpace() {
        assert((int) buffer.size() > pos);
        assert(buffer[pos] == ' ');

    void readEoln() {
        assert((int) buffer.size() > pos);
        assert(buffer[pos] == '\n');

    void readEof() {
        assert((int) buffer.size() == pos);

int32_t main() {
    ios_base::sync_with_stdio(0);   cin.tie(0);

    input_checker input;

    int T = input.readInt(1, 1000); input.readEoln();
    while(T-- > 0) {
        int N = input.readInt(1, (int)1e9); input.readSpace();
        map<int, int> fac;
        for(int l = 2 ; l * l <= N ; ++l) {
            while(N % l == 0) {
                N /= l;
        if(N > 1)

        int NQ = input.readInt(1, 200); input.readEoln();

        for(int i = 0 ; i < NQ ; ++i) {
            int P = input.readInt(1, (int)1e9); input.readEoln();
            if(P == 1) {
                cout << 1 << '\n';
            int cnt = 0;
            for(auto &[a, b]: fac)
                cnt += b % P == 0;
            cout << (1ll << cnt) << '\n';


    return 0;
Editorialist's code (Python)
for _ in range(int(input())):
    x, q = map(int, input().split())
    pows = []
    p = 2
    while p*p <= x:
        ct = 0
        while x%p == 0:
            x //= p
            ct += 1
        if ct > 0: pows.append(ct)
        p += 1
    if x > 1: pows.append(1)
    for i in range(q):
        p = int(input())
        ans = 1
        for ai in pows:
            if p > 1 and ai%p == 0: ans *= 2

In the question no additional constraints are mentioned, so assuming that
root(x) <= 3 * 10^4
and no. of test cases, t = 10^3
the max iterations for the just the calculation of prime factors can go up to
3 * 10^7, which is generally considered to not pass if coded in python. The second part of question, that is checking for ai for all Q adds to this complexity, and the solution should not pass the tests at all.

But the editorialist’s code does pass, so is there a hidden constraint that was not mentioned in the question? If yes, please mention such constraints, because I wasted a lot of my time today trying to optimize my code so that it fits the TL for python, and failing, ultimately discovering I could have contended with my first solution.

PS: I enjoyed the questions but my time would have been much more fruitful had the constraints been more clearer. Thank you for your efforts!

First of all, this is an overstatement:

is generally considered to not pass if coded in python

PyPy can deal with that many arithmetic operations just fine.

Secondly, a number with many factors gets factored faster than in sqrt operations, and a number with few factors makes it faster to loop over them. Hence your worst case is not achievable in practice.


PyPy can deal with that many arithmetic operations just fine.

Oh, okay, I did not know that, I apologize for not researching it enough.

a number with many factors gets factored faster than in sqrt operations

Yes, on average, if numbers are chosen (pseudo-) randomly, but if i make a test case just to push the program to it’s limits, say by taking all prime numbers near 10^9, the TC for the first part of problem should approach sqrt(x), right?

But now that you say PyPy can handle even the worst cases, this definitely doesn’t matter. Thank you for your time and explanation. And sorry for the previous rant :sweat_smile:


There is some nuance here, however - you definitely can’t expect PyPy to be fast all the time, and certain operations are faster than others (so an approximate count of the operations isn’t enough to tell you the whole picture).

I’m not really an expert on this, but in my experience, here’s some stuff that’s slow:

  • Input/output in general, specially the print function.
    For example, in ADVITIYA4 I actually got a TLE with a linear-time algorithm that called the print function 4\cdot 10^5 times (and does only about 4\cdot 10^5 operations otherwise), and had to optimize it by first storing all the answers in a list and then printing that list once at the end.
    The constraints there were later reduced to allow the natural implementation to pass.
  • I recall sorting a list of tuples being much slower than sorting integers.
  • Working with multidimensional arrays can be quite slow too.
  • Very rarely, things you might not expect have the wrong complexity.
    For instance, s += t where s and t are both strings is actually \mathcal{O}(|s| + |t|), so creating a string of length N by appending to it N times is in fact quadratic (surprisingly, CPython doesn’t have this behavior, and so is faster here)

All this aside, it also of course matters whether the operations are simple or not.
3\cdot 10^7 arithmetic operations should run perfectly fine within a couple seconds, 3\cdot 10^7 dictionary accesses (on a large-ish dictionary) probably won’t.
In fact, even 10^8 operations are not an issue if they’re simple enough (see ADVITIYA11, which needs about 10^8 operations, but they’re simple enough that my PyPy code runs in less than a second).

tl;dr PyPy can run decently fast, but you better know what you’re doing when using it (and remembering things in terms of operation limits probably isn’t the best idea).
i think if you’re not 100% confident it’s going to be fast enough, just use a faster language.

1 Like

Hey, i got the intuition but still not able to understand how it comes like i want to calculate for x=900 and p=2 so the answer is 8 and i got two-30 and 810000 but want to know how to get others by this logic just tell me…

can some1 tell whats the issue, i guess its the same as the authors code just that i am using maps to store prime factors with their powers

Clear the map before using in every test case, Because some previous primes will be there of previous test cases. [ just do m.clear() before solve() ]