CHANGE - Editorial

PROBLEM LINKS

Practice
Contest

DIFFICULTY

HARD

PREREQUISITES

Generating Functions, Partial Function Decomposition, Complex Numbers

PROBLEM

In a currency there are N denominations of coins, given as an array D. All elements in D are pairwise distinct and pairwise relatively prime.

Can you use this property of the coinage to design a fast algorithm to answer queries of the type

  • How many ways exist of making change for target S? Of course, the answer should be found modulo MOD.

EXPLANATION

There is a popular technique for creating a generating function for this the number of ways of making change. Let us quickly look at the appropriate generating function that gives us the answer.

G(x) = PROD ( SUMMA ( xu * d, 0 ≤ u < ∞ ), d ∈ D )

The coefficient of xS in G(x) (modulo MOD), will be the answer.

We can multiply and divide each term inside the product in G(x) with (1-xd). This gives us the following simplified expression for G(x)

         1
PROD ( ------ , d ∈ D ) ... (i)
       1 - xd

We can re-write (i) as

   1                        1
-------- * PROD ( -------------------- , d ∈ D ) ... (ii)
(1 - x)n          1 + x + x2 + ... xd-1

For some polynomial,

f(x) = 1 + x + x2 + ... xn-1

And an complex nth root of unity wn,

wn = eι /n

We can easily prove that

f(wni) = 0, 1 ≤ i < n

Thus, we can factorize f(x) as

(x - wn) (x - wn2) ... (x - wnn-1)

Now, we can transform (ii) into

   1                        1
-------- * PROD ( PROD ( ------- , 1 ≤ i < d ) , d ∈ D ) ... (iii)
(1 - x)n                 x - wdi

From the problem description, we are assured that all d are pairwise prime. This is a very important because this means that all wdi are also pairwise distinct! This means that each wdi is a simple root of the denominator. Hence, the Partial Fraction Expansion of

                 1
PROD ( PROD ( ------- , 1 ≤ i < d ) , d ∈ D ) ... (iv)
              x - wdi

Can eventually be written as

            P(wdi)
SUM ( SUM ( ------- , 1 ≤ i < d ) , d ∈ D ) ... (v)
            x - wdi

Let us find P(x). Note that P(x) will depend on d. Hence all the calculations below should be assumed to already be within the closure with i and d available from (v).

We know that


1

PROD ( 1 + x + x2 + … xd’-1 , d’ ∈ D )

	            P(w<sub>d'</sub><sup>j</sup>)
  = SUM ( SUM ( -------- , 1 ≤ j < d ) , d' ∈ D ) ... (vi)
	        	x - w<sub>d'</sub><sup>j</sup>

Let us multiply the LHS and RHS with (x - wdi).


x - wdi

PROD ( 1 + x + x2 + … xd’-1 , d’ ∈ D )

	                        P(w<sub>d'</sub><sup>j</sup>)
  = (x - w<sub>d</sub><sup>i</sup>) * SUM ( SUM ( -------- , 1 ≤ j < d' ) , d' ∈ D ) ... (vii)
	        	            x - w<sub>d</sub><sup>i</sup>

Now, let us keep x = wdi in (vii).

  • The RHS reduces to P(wdi), since all other terms become 0.
  • The LHS becomes of the form 0 / 0
    • But we know that this function is well defined for all real x

Thus, we use the L’Hopital’s rule for the LHS and take the derivatives of the numerator and the denominator. Out of the N terms in the first derivative of the denominator of the LHS we note that only one of them is non zero. Hence, we get

		                     1
P(wdi) = -------------------------------------------
		 (1 + 2wdi + 3wd2i + ... (d-1)wd(d-2)i)
		     * PROD ( 1 + wdi + wd2i + ... wd(d'-1)i , d' ∈ D AND d' ≠ d )

Now let us simplify the above a bit. We use the Sum of AGP to simplify the first term. We also use

  • wdd = 1
  • SUM ( wdi , 0 ≤ i < d ) = 0
    • we used this above already when we said that all except one term in the first derivative of the denominator of the LHS will be 0

to further simplify things above.

For simplicity, let us define P(d,y). y will always be a complex dth root of unity.

		                      y * (y - 1)
P(d,y) = ------------------------------------------------------
		 d * PROD ( SUM ( yi, 0 ≤ i ≤ (d' modulo d) ), d' ∈ D )

Now, expression for G(x) looks like

   1                   P(wdi)
-------- * SUM ( SUM ( ------- , 1 ≤ i < d ) , d ∈ D ) ... (viii)
(1 - x)n               x - wdi

We use the following Partial Fraction Decomposition to furthur break down the above

   1          1
-------- * ------- 
(1 - x)n   (x - y)

                   1                             1
         = ------------------ + SUM ( ---------------------- , 1 ≤ i ≤ N)
           (1 - y)N * (x - y)         (1 - y)N-i+1 * (x - 1)i

(This too can be proven easily by repeatedly using the L’Hopital’s Rule!)

Now, we have G(x) broken into non reducible fractions.

                       P(d,wdi)                        P(d,wdi)
G(x) = SD( Si ( ---------------------- + Sj ( ----------------------- ) ) )
                (1 - wdi)N * (x - wdi)         (1 - wdi)N-j * (x - 1)j

Where,
0 ≤ j < N
1 ≤ i < d
d ∈ D

Now, we can use the Generalized Binomial Theorem to find the coefficient of xS in G(x). The expression for the coefficient will look like

            Ud(wdi)
SUM ( SUM ( ------- , 1 ≤ i < d ) , d ∈ D )
            Vd(wdi)

Let, Qd(x) = 1 + x + x2 + … xd-1

As we can see above, we want to calculate

  • Ud(y) / Vd(y) for some y, such that Qd(y) = 0

Let Zd(y) be a polynomial of degree strictly less than d-1 such that

  • Vd(y) * Zd(y) = 1 modulo Qd(y) = 0

This means that

  • Vd(y) * Zd(y) = 1 + Rd(y) * Qd(y)

Now,

  • 1 / Vd(y) = Zd(y) / (Vd(y) * Zd(y))
    • = Zd(y) / (1 + Rd(y) * Qd(y))

But, Qd(y) = 0 for all y that we want to evaluate Vd(y) for.

Thus, we can run the Extended Euclid’s Algorithm for univariate polynomials with coefficients in a field to find the inverse of Vd(y) modulo Qd(y).

This gives us a polynomial Td(y) to play with!

The answer will be

SUM ( SUM ( Td(wdi) , 1 ≤ i < d ) , d ∈ D )

But, calculating Td(y) requires a lot of calculations in complex numbers whose coefficients are irrational. We cannot find the result modulo MOD just with this knowledge yet.

Now comes another proof that simply eliminates all the complex numbers from all the calculations!

Let

Td(y) = t0 + t1y + t2y2 + ... td-1yd-1

We can infer from all the above discussions that

  • Ud(y) has degree 1
  • Zd(y) has degree d-2
  • Thus, Td(y) has degree d-1
SUM ( Td(wdi) , 1 ≤ i < d )
		= SUM ( tp * SUM ( wdi * p , 1 ≤ i < d ) , 0 ≤ p < d )
		= SUM ( tp * SUM ( wdi * p , 1 ≤ i < d ) , 1 ≤ p < d ) + t0 * (d - 1)

We know that Qd(wdp) = 0 for any non-zero t.
Thus, SUM ( wdi * p , 1 ≤ i < d ) = -1
We can simplify the above as

		= SUM ( -tp, 1 ≤ p < d ) + t0 * (d - 1)
		= d * t0 - SUM ( tp , 0 ≤ p < d )
		= d * Td(0) - Td(1)

This allows us to use the polynomial Zd(y) that we generated for evaluation only for 0 and 1.

It is worth noting that the large S is not a problem at all while solving the problem. It appears only in the exponent of wd. This lets us consider S modulo d conveniently.

Doing this for each d, solves the problem.

There lie several simplifications to the polynomials along the way that can be exploited to make the code neater.

SETTER’S SOLUTION

Can be found here.

TESTER’S SOLUTION

Can be found here.

6 Likes

I will try to understand what you wrote in the editorial, but in the mean time let me present my (very different and more digestible) solution which got Accepted during the contest. We have the equation: D1 * x1 + D2 * x2 + … + DN * xN = C, where xi >= 0. We can think of the base 2 representation of each xi: xi = 2^0 * x(i,0) + 2^1 * x(i,1) + … + 2^(K-1) * x(i,K-1), where K is the number of bits in the base 2 representation of C. Note how each of the values x(i,j) is either 0 or 1 and can be chosen independently from the others.

With this observation we will use the following dynamic programming algorithm: we compute CNT(p, S) = the number of ways of properly obtaining the first p bits of C (i.e. the least significant bits, from 0 to p-1), such that we have an addition carry of S. It should be mentioned that S can never exceed D1+D2+…+DN.

Initially, we will have CNT(0, 0) = 1 and CNT(0, S>0) = 0. In the end, the result will be found in CNT(K, 0) (where K is the number of bits of C).

Let’s assume that we have the values CNT(p, S) computed and let’s see how to compute the values CNT(p+1, S). We will run a knapsack-like algorithm starting from the values CNT(p, S). We will consider the numbers D1, D2, …, DN and each of them can be selected 0 or 1 times (because they correspond to deciding whether the x(p,…) values are 0 or 1).

The knapsack-like algorithm is very simple: think about computing CNTTMP(i,S) = the number of ways of obtaining the sum S for the current bit p after considering the numbers D1, …, Di. Initially, CNTTMP(0,S)=CNT(p,S). Then, CNTTMP(i,S) = CNTTMP(i-1,S) + CNTTMP(i-1,S-Di) (i.e. we either use Di as part of S or we don’t). After running this knapsack-like algorithm, we will have the values CNTTMP(N,S) computed, for each S up to 2 * (D1+…+DN) (this is because S can be as large as D1+…+DN from the previous bit and, in the worst case, when all the Dis are selected, we can get up to 2 * (D1+…+DN)).

After we have the values CNTTMP(N,S), getting the values CNT(p+1,S) is very easy. Let r be the pth bit of C (r is either 0 or 1). We will have CNT(p+1,S)=CNTTMP(N,2*S+r) (i.e. we go through the sums S from CNTTMP with parity r and, since we move from p to p+1, the last bit is not carried, so S will contribute with a carry equal to S/2).

The time complexity of this solution is O(N * S * log2©) (where S=D1+…+DN). If we do the math, it’s actually a pretty big number (N <= 50, S may be around 18.000 or perhaps even a bit more, and log2© is around 333-334). In order to have any hope of having such a solution Accepted, several optimizations must be made:

  • sort the Dis in asceding order (and use them in this order in the knapsack-like algorithm)
  • when running the knapsack-like algorithm never consider sums for which you are sure the result would be 0 (e.g. CNTTMP(i,S) may be non-zero only up to S=(D1+…+DN)+(D1+…+Di))
  • use a single array for computing everything (it was easier to explain in terms of CNT and CNTTMP as being different multidimensional arrays, but in the actual implementation you can actually use a single array which is updated accordingly)

With these optimizations and using unsigned int as the type to store the result, the solution can run in about 2 seconds (or slightly more) WITHOUT computing modulo 1000000007 (i.e. letting the additions simply overflow the unsigned int, which is the same as computing the result modulo 2^32).

Computing the numbers modulo 1000000007 turned out to be trickier than expected. The natural thing to do, after something was added to a value X, is the following: if (X >= 1000000007) X -= 1000000007 (note that I am not even considering using the actual modulo operator, which is much slower ; I am only considering the case when two values < 1000000007 are added together, in which case the above piece of code works fine). However, by doing this in the performance-critical knapsack loop, the running time increases up to 4 times! Then I tested the same code without the “if” condition (i.e. I always subtracted 1000000007 from X) and the running time was almost the same. Thus, the “if” instruction was the performance bottleneck. So what needed to be done was to decrement X by 1000000007 when needed, but without using an “if” instruction (i.e. by using only arithmetic and bit operators).

In the end, I replaced the above piece of code by X -= 1000000007 * ((X >> 30) & 3). Note that this is not exactly equivalent with the previous piece of code: for instance, if X is larger than 1000000007 but below 2^30 (which is 1073741824), X is not changed. But it’s OK if these values remain >= 1000000007, as long as adding two such values together will not cause an addition overflow (which, obviously, doesn’t happen for unsigned ints).

Outside of the performance-critical knapsack loop (e.g. when setting CNT(p+1, S) = CNTTMP(N, 2 * S + r)) we can test if the values are >= 1000000007 and decrease them accordingly if needed.

As a side note, this solution can also be implemented without running a knapsack-like algorithm for each bit of C. Instead, in order to compute the values CNT(p+1,S), we need to do a convolution of two sequences (one of them is CNT(p,…) and the other one is SCNT(…), where SCNT(S)=the number of ways of obtaining the sum S using each of the values D1,…,DN at most once). I actually implemented this (using the convolution code of the official solution from the problem QPOLYSUM), but it is many times slower than simply using the knapsack-like algorithm for each bit of C.

10 Likes

Oh. Is that all we had to do? :wink:

Every time I think I’ve seen the limits of the Making Change problem pushed as far as they can go, someone pushes them farther. Very nice.

2 Likes

It’s too hard to read the editorial…The complexity of my solution is O(n^2*D),a bit better than setter’s solution. My idea is similar to setter’s idea.But I don’t know what is the bottleneck of setter’s solution.
Maybe I’ll explain my idea in some days.

1 Like

Also, reading your alternative approaches and see how you tackle problems in such a methodic and direct way, is making me learn a lot as well!!

Congratulations and keep up the fantastic work :smiley:

hahahahaha :stuck_out_tongue:

1 Like

This is hardly readable. It would be much better to see a pdf written in latex.
What is the complexity of the setter’s solution? Was not sure whether my O((nA)^2) will pass. A<=500

@damians I tried very hard to describe the solution as best as I could :frowning: I wrote the entire thing in markdown. A pdf version would have been a lot better certainly, but I trying to deliver only what discuss.codechef.com consumed.

That said, it will be a lot of effort to write the same ideas in pdf (I do not know if there is a trivial way to convert markdown to pdf). I believe the rendering of markdown locally in my system is a lot better than here on discuss. Maybe I could find some tool to make a pdf here.

Let me try that.

The complexity of the setters solution is O(n * D * D)

Well, the description is very good, but fractions, products, sums etc. in plain HTML look poorly. I know there is no techniqual possibility to do it in a different way. Admins should consider adding latex support to discussion board.

@apia: It would be great if you could explain your solution, since yours was the fastest solution in the contest (and it seems that for good reasons, given the time complexity you mentioned).

3 Likes