SEAGCD2 - Editorial

kevinsogo · May 17, 2016, 1:54am

PROBLEM LINK:

Author: Sergey Nagin
Tester: Kevin Atienza
Translators: Sergey Kulik (Russian), Team VNOI (Vietnamese) and Hu Zecong (Mandarin)
Editorialist: Kevin Atienza

DIFFICULTY:

Medium

PREREQUISITES:

Dynamic programming, combinatorics, number theory

PROBLEM:

How many arrays A[1\ldots N] are there such that 1 \le A_i \le M and \gcd(A_i,A_j) = 1 for all i < j?

QUICK EXPLANATION:

Every number > 1 appears at most once in the sequence. Everything else is 1.

Let S be the set of values of the sequence that are > 1. Our strategy will be to enumerate all possible sets S, and then sum up all their contributions to the total answer.

The first step is to enumerate all possible S's by enumerating all subsets of \{2, 3, \ldots, 100\} that are pairwise coprime. Actually, we only need the sizes, so this can be done very efficiently with memoization and some optimizations.

The next step is to add the contribution of each set S. We only need its size, s. There are \frac{N!}{(N-s)!} ways to select the arrangements of these values in the sequence, and everything else will be 1. So the answer is the sum of \frac{N!}{(N-s)!} among all sizes s.

EXPLANATION:

For M \le 10

For any two numbers a and b, \gcd(a,b) = 1 if and only if a and b has no common prime factors. In other words, if we define P(n) as the set of prime factors of n, then \gcd(a,b) = 1 if and only if P(a)\cap P(b) = \emptyset.

Since there are only four primes \le 10, namely \{2,3,5,7\}, we can formulate a dynamic programming solution that has approximately 2^4N states. If S is a subset of \{2,3,5,7\}, define f(n,S) as the number of sequences such that:

There are n elements.
Each element is in [1,M].
\gcd(A_i,A_j) = 1 for all i < j.
The only allowed prime factors are in S.

Clearly the answer that we want is f(N,\{2,3,5,7\}). Also, we can implement a subset of \{2,3,5,7\} as a bitmask of 4 bits, which is simply an integer from 0 to 15.

To find a recurrence for f(n,S), we enumerate all possible n th elements, from 1 to M. For a value v to be a valid last element, its set of prime factors, P(v), must be a subset of S. (Using bitmasks, this can be implemented using bitwise operators.) Afterwards, the set of allowed primes for the remaining n-1 elements is S - P(v), where “-” denotes set difference. Thus, we have the following recurrence:

f(n,S) = \sum_{\substack{v=1 \\\ P(v)\subset S}}^M f(n-1,S - P(v))

For optimization, the $P(v)$s can be computed beforehand. Also, for the base case, we have f(0,S) = 1. Thus, we can now compute f(n,S) using dynamic programming! See the following pseudocode:

for n = 0..N:
    for S subset of {2,3,5,7}:
        if n == 0:
            f[n][S] = 1
        else:
            f[n][S] = 0
            for v = 1..M:
                if P(v) is a subset of S:
                    f[n][S] += f[n-1][S - P(v)]

print f[N][{2,3,5,7}]

Representing these sets as bitmasks and using bitwise operators for set operations, we get the following more detailed pseudocode:

// precompute P
P[1..M]
for v = 1..M:
    if v % 2 == 0: P[v] |= 1 << 0
    if v % 3 == 0: P[v] |= 1 << 1
    if v % 5 == 0: P[v] |= 1 << 2
    if v % 7 == 0: P[v] |= 1 << 3

// compute f
f[0..N][0..15]
for n = 0..N:
    for S = 0..15:
        if n == 0:
            f[n][S] = 1
        else:
            f[n][S] = 0
            for v = 1..M:
                if (S & P[v]) == P[v]:
                    f[n][S] += f[n-1][S ^ P[v]]

print f[N][15]

It’s much more obvious now how many operations this will take, so we can now see that this runs fast enough for the first subtask!

For M > 10

Unfortunately, for the remaining subtasks, M can reach up to 100, and there are 25 primes up to 100. This means that the algorithm above will have 2^{25}N states. For subtask two, you might still be able to optimize a lot and squeeze into the time limit (though it isn’t easy), but for subtask three, there’s no way this will pass. Instead, we need to find better solutions.

Solution 1

Let’s call a number big if it is greater than 1.

The first important insight is that every big number will appear at most once in our sequence. Otherwise, if x > 1 appears more than once, say at positions i and j, then \gcd(A_i,A_j) = x \not= 1, which violates our requirements. This automatically means that there are at most 99 big numbers in our sequence, and for large N, this means that most elements will be equal to 1!

Thus, we have two remaining tasks:

Enumerate all sets of big numbers that are pairwise coprime, and
For each set of big numbers, find the number of valid sequences of length N having exactly these numbers as its big numbers.

The second one is actually easy: If the set of big numbers has size s, then there are exactly {N \choose s} ways to choose the places where these big numbers will go, and then s! ways to choose which big number goes where. Everything else will be 1, thus, there are {N \choose s}s! = \frac{N!}{(N-s)!} valid sequences!

To compute N! and (N-s)!^{-1}, we need to precompute all factorials and inverse factorials up to 100000 beforehand. We can use the following simple recurrences:

\begin{aligned} N! &\equiv N(N-1)! \bmod (10^9 + 7) \\\ N!^{-1} &\equiv N^{-1}(N-1)!^{-1} \bmod (10^9 + 7) \end{aligned}

All that remains is enumerating all sets of big numbers that are pairwise coprime. Let’s call such sets big sets. To begin with, let’s try a simple recursive enumeration. Let’s create a routine called all_big_sets(m, S), where m is the maximum allowed value in the big set, and S is the set of disallowed primes. The initial call is simply all_big_sets(M, {}), where {} denotes the empty set.

A straightforward implementation is the following:

def all_big_sets(m, S):
    if m == 1:
        yield {}
    else:
        // if m is not in the set
        for big_set in all_big_sets(m-1, S):
            yield big_set

        // if m is in the set
        if P(m) and S don't intersect:
            for big_set in all_big_sets(m-1, union(P(m), S)):
                yield union({m}, big_set)

This yields all big sets. In most languages, you can just collect all sets called with yield in a giant list, like in the following:

def all_big_sets(m, S):
    big_sets = []
    if m == 1:
        big_sets.add({})
    else:
        // if m is not in the set
        for big_set in all_big_sets(m-1, S):
            big_sets.add(big_set)

        // if m is in the set
        if P(m) and S don't intersect:
            for big_set in all_big_sets(m-1, union(P(m), S)):
                big_sets.add(union({m}, big_set))

    return big_sets

Unfortunately, this doesn’t work as it is, because there are just too many big sets. So here, we will describe a few optimizations that will make things faster.

First, notice that we can do a simple optimization: instead of the sets themselves, we can just enumerate the sizes of the big sets, because the sizes are all we need.

def all_big_set_sizes(m, S):
    sizes = []
    if m == 1:
        sizes.add(0)
    else:
        // if m is not in the set
        for size in all_big_set_sizes(m-1, S):
            sizes.add(size)

        // if m is in the set
        if P(m) and S don't intersect:
            for size in all_big_set_sizes(m-1, union(P(m), S)):
                sizes.add(size + 1)

    return sizes

A second optimization is to collect the sizes as a list of frequency counts for each size, because the largest size of each big set is at most m:

def all_big_set_sizes(m, S):
    size_counts[0..m] // initially all zero
    if m == 1:
        size_counts[0]++
    else:
        // if m is not in the set
        counts = all_big_set_sizes(m-1, S)
        for i = 0..m-1:
            size_counts[i] += counts[i]

        // if m is in the set
        if P(m) and S don't intersect:
            counts = all_big_set_sizes(m-1, union(P(m), S))
            for i in 0..m-1:
                size_counts[i + 1] += counts[i]

    return size_counts

But still, this isn’t enough; even though the output list is now small, we still call all_big_set_sizes too many times. Well, that’s easy to fix. We just memoize all_big_set_sizes!

memo = {}
def all_big_set_sizes(m, S):
    // check if it's in the memo
    if memo has (m, S) as a key:
        return memo[(m, S)]

    size_counts[0..m] // initially all zero
    if m == 1:
        size_counts[0]++
    else:
        // if m is not in the set
        counts = all_big_set_sizes(m-1, S)
        for i = 0..m-1:
            size_counts[i] += counts[i]

        // if m is in the set
        if P(m) and S don't intersect:
            counts = all_big_set_sizes(m-1, union(P(m), S))
            for i in 0..m-1:
                size_counts[i + 1] += counts[i]

    memo[(m, S)] = size_counts
    return size_counts

But this is still not enough, because there is still too many distinct calls to all_big_set_sizes! (For instance, there are already at least 2^{25} distinct values for the parameter S, because it is a set of primes \le 100.)

At this point, there are two major optimizations that will allow all_big_set_sizes to run quickly. You can implement either one of them (or even both) to pass.

Optimization 1: Primes > 50

This optimization uses the fact that primes > 50 will appear never appear with other prime factors, because if p > 50, then 2p already exceeds 100. Thus, we can try to consider these primes separately.

The first step is to create a version of all_big_set_sizes but which only considers primes up to 50:

memo = {}
def _all_big_set_sizes(m, S):
    // check if it's in the memo
    if memo has (m, S) as a key:
        return memo[(m, S)]

    size_counts[0..m] // initially all zero
    if m == 1:
        size_counts[0]++
    else:
        // if m is not in the set
        counts = _all_big_set_sizes(m-1, S)
        for i = 0..m-1:
            size_counts[i] += counts[i]

        // if m is in the set
        // Note: max_prime(m) is the largest prime factor of m
        if max_prime(m) <= 50 and P(m) and S don't intersect:
            counts = _all_big_set_sizes(m-1, union(P(m), S))
            for i in 0..m-1:
                size_counts[i + 1] += counts[i]

    memo[(m, S)] = size_counts
    return size_counts

The added check here is max_prime(m) <= 50, which prevents adding primes > 50. Next, we implement the real all_big_set_sizes, which takes into account these primes:

def all_big_set_sizes(m):
    P = (number of primes in the range [51,m])
    size_counts = [0]*(m+1)
    counts = _all_big_set_sizes(m, {})
    for i = 0..m:
        if counts[i] != 0:
            for j = 0..P:
                size_counts[i+j] += counts[i] * CHOOSE(P, j)
    return size_counts

Here, we initially call _all_big_set_sizes above, and then take into account primes > 50 using some combinatorics. Here, j is the number of primes > 50 that we include, and CHOOSE(P, j) is the number of ways to select these j primes out of the P possible. The function CHOOSE itself can be computed using factorials and inverse factorials, which we will need anyway in the second step above.

Using this optimization, all_big_set_sizes now runs fast enough to pass the time limit!

Optimization 2: Limiting the values of S

A different optimization involves looking at the parameter S more carefully. The purpose of this parameter is to exclude primes that have already been included before. However, we are iterating m in decreasing order, which means that the maximum prime that will possibly appear is the largest prime \le m. Thus, if there are primes in S that are greater than m, then we can safely remove them and the answer will stay the same! This reduces the number of distinct all_big_set_sizes calls drastically, and in fact is fast enough to get Python accepted in under 0.5 seconds!

memo = {}
def all_big_set_sizes(m, S):
    // cleanse S of primes > m
    Let X = (set of all primes <= m)
    S = intersection(S, X)

    // check if it's in the memo
    if memo has (m, S) as a key:
        return memo[(m, S)]

    size_counts[0..m] // initially all zero
    if m == 1:
        size_counts[0]++
    else:
        // if m is not in the set
        counts = all_big_set_sizes(m-1, S)
        for i = 0..m-1:
            size_counts[i] += counts[i]

        // if m is in the set
        if P(m) and S don't intersect:
            counts = all_big_set_sizes(m-1, union(P(m), S))
            for i in 0..m-1:
                size_counts[i + 1] += counts[i]

    memo[(m, S)] = size_counts
    return size_counts

The change here is in the first few lines, where a call to intersection is done to reduce the number of elements of S.

This solution implemented in one of the tester’s solutions below. (Sets are implemented as bitmasks.)

Solution 2

A different solution involves considering all primes > 10 specially, not just primes > 50. The insight here that we will use heavily is that each number has at most one prime factor > 10.

We will need our f(n,S) function earlier. Recall that f(n,S) is the number of valid sequences of length n where the only allowed prime factors are in S. This time, we will only precompute this table for S that is a subset of \{2, 3, 5, 7\}. (Obviously this isn’t enough because M > 10, but we will still need these values.)

The key insight here is that for every value 1 \le v < 10, the primes in the range \left(\frac{100}{v+1}, \frac{100}{v}\right] are all “functionally” identical. This is a generalization of the fact that we used earlier: that primes > 50 can be treated specially because they all appear alone. Similarly:

Every prime p in the range (100/3,100/2] = [34,50] can only appear as p or 2p.
Every prime p in the range (100/4,100/3] = [26,33] can only appear as p, 2p or 3p.
etc.

Thus, really, all we care about is how many primes from each of the following ranges we take: (100/2, 100/1], (100/3, 100/2], (100/4, 100/3], etc. Afterwards, we figure out using some combinatorics where their positions will be, and which ones will appear as p, 2p, 3p, etc. And then finally, once we take care of these primes, we can now fall back to our precomputed f(n,S), because all primes > 10 have been accounted for already!

Lots of details have been skipped, but if you want to find out more, this solution is implemented in one of the tester’s solutions below, so you can check it out.

AUTHOR’S AND TESTER’S SOLUTIONS:

Setter
Tester 1
Tester 2

tony_hager · May 18, 2016, 5:36pm

What’s the running time for the author’s and testers solutions? Are they faster than other contestants solutions? I guess I could enter them myself under the practice section to see. But it doesn’t feel right to enter someone else’s code under my name.

rahulkhairwar · May 18, 2016, 5:40pm

The setter’s and tester’s solutions are not of this question.

sdssudhu · May 18, 2016, 6:03pm

Another addition is we can enumerate only odd numbers <=50 and since there can be only at the most one even number per set from the set we can determine how many even numbers are possible and if there are k even numbers to choose from we have k sets of size i+1 and 1 set of size i. I solved it using the same method but with the even number thing . here’s the link to my C++ solution.

https://www.codechef.com/viewsolution/10070529

sdssudhu · May 18, 2016, 6:05pm

Here is the C++ solution for the question using the same method but with some additions.

https://www.codechef.com/viewsolution/10070529

sumeetshirgure · May 19, 2016, 8:40am

There also exist closed-form polynomial expressions for F(N,M):
F(N,1)=1, F(N,2)=1+N, F(N,3)=1+N+N^2, F(N,4)=1+N+2N^2 , F(N,5)=1+3N-N^2+2N^3 , F(N,6) = 1+3N+2N^3
F(N,7) = 1-N+9N^2-4N^3+2N^4 … I don’t see any pattern as such , except that F(N,M) ~ O(N^pi(M))
, where pi(M) counts no. of primes <= M.

dhavalmehta · May 19, 2016, 3:34pm

Can anyone explain solution 2 in more depth ? Thanks in advance.

prudvinit · May 20, 2016, 1:42pm

In the routine all_big_sets(m,S)
What is meant by the statement // if m is not in the set
To which set m is being referred?

priyanshi · July 24, 2018, 3:14pm

Why the author’s solution is giving TLE?

nilesh3105 · May 18, 2016, 5:20pm

Fix the solutions links. They currently link to DISTNUM2 solutions.

kevinsogo · May 21, 2016, 4:38am

That line is just a comment saying that we’re considering the case that “m” is not in the big set we are constructing.