# LUCASTH - Editorial

Author: Zhuolin Yang
Tester: Hanlin Ren
Editorialist: Hanlin Ren

HARD

### PREREQUISITES:

Fast Fourier Transform, Divide and Conquer, Lucas Theorem

### PROBLEM:

Let f(n,k) be the sum of products of all size-k subsets of [n]=\{1,2,\dots,n\}. Formally

f(n,k)=\sum_{S\subseteq[n],|S|=k}\prod_{x\in S}x.

Given n\le 10^{501} and prime p\le 10^5, how many ways are there to select a k from 0 to n such that f(n,k) is not divisible by p?

### QUICK EXPLANATION:

• The answer is the number of nonzero coefficients of P(x)=\prod_{i=1}^n(x+i), under modulo p;
• Let c=\lfloor n/p\rfloor, b=n\bmod p. Then P(x)\equiv f(x)g(x) \pmod p, where f(x)=[\prod_{i=1}^p(x+i)]^c and g(x)=\prod_{i=1}^b(x+i);
• You can observe the fact that \prod_{i=1}^p(x+i)\equiv (x^p-x)\pmod p;
• Let \Delta(P) be the number of nonzero coefficients of polynomial P, then \Delta(P)=\Delta(f)\Delta(g), since f has “gap” p-1 and \deg(g) < p-1;
• \Delta(f) can be computed by Lucas Theorem; g can be computed by divide-and-conquer and FFT;
• The time complexity is the addition of two parts: converting n to p-ary which costs O(\log^2 n), and computing g which costs O(p\log^2 p).

### EXPLANATION:

In this subtask, n\le 5000. Let f[i][j] be the product of size-j subsets of \{1,2,\dots,i\}. Then we can obtain the following dp equation:

f[i][j]=f[i-1][j]+i\cdot f[i-1][j-1].

This is because, if we choose i into our subset, the product should be i\cdot f[i-1][j-1]; otherwise it should be f[i-1][j].

We can compute f[\cdot][\cdot] in O(n^2) time and then compute the answer.

In this subtask, n\le 200000. We need an observation: f(n,k) is the coefficient of x^{n-k} in the polynomial

\prod_{i=1}^n(x+i)=(x+1)(x+2)\dots(x+n).

Why is that? Suppose we want to compute the coefficient of x^{n-k} of above polynomial. Think about how you multiply polynomials, and you’ll realize that it’s just the sum of products over all choices of k subsets of \{1,2,\dots,n\}, which is the definition of f(n,k).

Now the task becomes to compute the product of n polynomials of the form (x+i). This can be done by divide and conquer, along with FFT. Let A(x) be the product of the first half of (x+i)'s, and B(x) be the product of rest (x+i)'s. We compute A(x) and B(x) recursively, and then compute A(x)\cdot B(x) by FFT. The time complexity is T(n)=2T(n/2)+O(n\log n), so T(n)=O(n\log^2 n). This idea is illustrated by the following pseudocode, where MUL(S) computes \prod_{s\in S}(x+s).

MUL(S)
if (|S| == 1) then return (x+s[0])
partition S into two sets S_1, S_2, each of size roughly |S|/2
A = MUL(S_1)
B = MUL(S_2)
return A*B //A*B is computed by FFT


After that, we just count the number of coefficients which can’t be divided by p.

Following what we did before, let P(x)=(x+1)(x+2)\dots (x+n)=a_0+a_1x+a_2x^2+\dots+a_nx^n, we know that the answer is the number of a_i's which can’t be divided by p. Let c=\lfloor n/p\rfloor and b=n\bmod p. Under modulo p, we have:

P(x)=[(x+1)(x+2)\dots (x+p)]^c(x+1)(x+2)\dots (x+b).

You can observe that(see the second last paragraph of Finite field)

(x+1)(x+2)\dots (x+p)\equiv (x^p-x)\pmod p,

Therefore

P(x)=(x^p-x)^c\cdot (x+1)(x+2)\dots(x+b).

Since we only want to count the number of nonzero coefficients(after modulo p), it’s no difference if we replace (x^p-x) by (x^{p-1}-1). We let f(x)=(x^{p-1}-1)^c, and g(x)=(x+1)(x+2)\dots(x+b). For convenience, if b=p-1 we let f(x)=(x^{p-1}-1)^{c+1} and g(x)=1. Let \Delta(f) be the number of nonzero coefficients of f, then \Delta(f\cdot g)=\Delta(f)\Delta(g), since g has degree < p-1 and f has nonzero coefficients only on x^{i(p-1)} terms.

### Solving \Delta(f)

Let’s suppose f(x)=(x^{p-1}-1)^k for some integer k. Then

f(x)=\sum_{i=0}^kx^{i(p-1)}(-1)^{k-i}\binom{k}{i}.

We write k,i in base p, say k=k_1k_2\dots k_l and i=i_1i_2\dots i_l. By Lucas’s Theorem, \binom{k}{i}\not\equiv 0\pmod p if and only if \forall 1\le j\le l, i_j\le k_j. Therefore \Delta(f)=(k_1+1)(k_2+1)\dots(k_l+1) since the j-th bit of i has (k_j+1) options.

### Solving \Delta(g)

\Delta(g) can be computed by the divide-and-conquer with FFT in subtask 2.

### Time complexity

We need O(len^2) time to convert n into base p, where len=O(\log n) is the length of n. We need a divide-and-conquer and FFT, whose time complexity is O(p\log^2 p). Thus the total time complexity is O(len^2+p\log^2 p).

### AUTHOR’S AND TESTER’S SOLUTIONS:

Author’s solution can be found here.
Tester’s solution can be found here.

2 Likes

There is a way to “solve” it using the O(n^2) algorithm, I’m both proud and disgusted of my solution.

From plotting the function f(i,j)\%P as a matrix you can see that it has a triangular shape that after P lines repeats itself in a pattern that looks a lot like the Sierpinski triangle. I then knew that I only needed the $(N%P)$th row of the triangle to solve the problem. This means I had to calculate f(N\%P,j) for all possible values of j.

Using the O(n^2) algorithm this can be done in O(N\%P)^2, which was far too slow if N\%P was big. I thought that probably the author had thought of this and had made N\%P really big.

However there is a simple way to calculate f(N\%P,j) for large values of N\%P. The function behaves periodic so I simply started out on row P and then went backwards up to row (N\%P). This worked =P. My passing solution had the time complexity O(min(N\%P,P-(N\%P))^2).

4 Likes

Nice problem! One nitpick in the description: it’s not always true that deg(g)<p−1, as n can be -1 mod p. This case can be handled by adding 1 to n, as this just introduces an extra factor of x (mod p) to the polynomial product, which doesn’t affect the number of non-zero coefficients. (I think your solution actually does this.)

You can reduce the time taken to work out Δ(g) to O(p.log§) operations if you want to, by using a single FFT of order p-1 in the multiplicative group mod p. That’s because this FFT converts the coefficients of a polynomial (of degree less than p-1) to the set of values it takes. So the inverse FFT, also an FFT, reconstructs the coefficients of g from the set of values the polynomial takes: g(1), …, g(p-1). These values can be worked out in O§ time because g®=(r+b)/r*g(r-1). The required FFT isn’t a power-of-2 size, but it can converted to three power-of-2 FFTs using a variant of Bluestein’s algorithm that doesn’t need (2N)^th roots of unity.

This is how one of my solutions worked. It made things maybe two or three times faster compared with a divide-and-conquer approach. (The gain wasn’t huge because the overheads are such that p around 100,000 still counts as ‘small’!) It takes 0.26 seconds using Python+numpy.

time limit for this problem must be 1 or 2 sec(hard problem). because i see some approach that not considered for this problem.

1 Like

To illustrate what is talked about in gorre_morre’s post here is two images showing an image plot of f(n,k)%p!=0 (where black is true, white is false) with n along the vertical axis and k along the horisontal axis.

Here p=5 shows the Seirpinski fractal/gasket (large image, so expand at will):

Click to view

And here p=101 shows the detail of a single element, where we note that all base triangles are the identical:

We’re interested in the number of black pixels in some row. So just by finding the answer for the base triangle we can use the structure of the Sierpinski fractal to compute the answer for an arbitrary row. Specifically this involves interpreting the row index as a number in base p to see how many base triangles there is for a given index.

2 Likes

@userhacker no 1 sec is also too long … it should be 0.000…inf…1 nano seconds

Using OEIS, I noticed the given example was a sequence produced by Stirling Numbers of the First Kind. I then looked for formulas for Stirling numbers. I used FFT for numbers less than 100K.

Can anyone explain this line “Let Δ(f)Δ(f) be the number of nonzero coefficients of ff, then Δ(f⋅g)=Δ(f)Δ(g)Δ(f⋅g)=Δ(f)Δ(g), since gg has degree <p−1<p−1 and ff has nonzero coefficients only on xi(p−1)xi(p−1) terms.”

I mean how is delta(f.g) =delta(f)*delta(g) ?

how can you be sure that after multiplying f and g, no coefficient will become zero. I understand that all terms will have different powers. But that does not stop a coefficient becoming 0 mod p.

Can u please how did u calculate the ans by calculating N%p rows? I too saw some trinagles bt wasnt able to deduce anything

@algmyr about time complexity of this if N was equal P/2 now what is time complexity of this?!

@userhacker The case n=p/2 is not actually the worst case, it’s more like n=p/\sqrt{2} (so about 70000 for the largest allowed prime). And as for time complexity: Finding the correct row is quadratic in n\,\textrm{mod}\,p. Finding the number of triangles for a given row is more or less O(\log(n)), at least small enough to be insignificant.

If you are clever about searching from the top or bottom depending on what row you want, you ended up passing the test cases, although you can easily devise cases that kills such a solution, like n=p/\sqrt{n}.

@gorre_morre said time complexity so according that i choose N=49999 and p=99991 then we have
O(min(49999,49992)^2)=O(49992^2)=O(2499200064) and we have 4 test case?

i think you getting close to main solution. in main solution we find a general pattern for every N according to p,p^2,p^4,p^8,p^16 ,… <10^501. just this have one exception for that we need calculate F(value of exeption,p) with FFT. this is fastest way. your way is near it.

That time complexity is correct asymptotically, but the actual number of operations is a bit different since we work on a triangle. The point where working from above is more expensive is a bit lower down than half the triangle, essentially the horizontal line that cuts the triangle in two pieces with the same area.

This way is nowhere near the one expected one complexity wise. It’s just pure luck that this approach ends up working with the test cases, n\,\textrm{mod}\,p never ended up close to the worst case for this algo.

check this test case for TLE:
4
49999 99991
, 49999 99971
, 49999 99989
, 51999 99991

goodby man.good luck.

On my laptop just the first case takes like 6 seconds (pypy) which wouldn’t by itself be a TLE, since pypy for some reason has the same time multiplier as python, 5x. I think your test could potentially actually work if run on the server. However 50k is not the worst case, as stated. From what I measured on single cases, in (p,time) pairs: (50000,6.3), (60000,9.1), (70000,36.7), (80000,14.3), (90000,7.6)