PROBLEM LINK:Author: Misha Chorniy DIFFICULTY:MediumHard PREREQUISITES:Dynamic programming, matrix exponentiation, Kirchhoff's matrixtree theorem, Chinese remainder theorem, Garner's algorithm, big integers PROBLEM:There are $N$ stars, and $M$ bidirectional space roads in space that connect pairs of stars together. Travelling through one space road takes 1 day. Chef has spaceships. Each spaceship is designated two stars, and a fixed route between these stars. A route is a path that is traversable within at most $K$ days. Roads may be traversed any number times. Also, there is a spaceship for every possible route between stars. The Empire deployed a weapon capable of destroying exactly $X$ spaceships that travel between the same pair of stars. The Empire used the weapon as many times as possible, so after some time, there aren't enough groups of spaceships any more to destroy. Chef decided to retire most of the surviving spaceships. Chef wants to retire as many spaceships as possible, but Chef wants to do this such that there is still a way to get from any star to another using spaceships. How many ways are there for Chef to retire spaceships so that this is satisfied? QUICK EXPLANATION:The answer is the number of spanning trees of a related (multi)graph, obtained by the following construction:
To compute $P(i,j)$: Let $G$ be the adjacency matrix of the input graph. Compute the matrix $M = G^K + G^{K1} + \ldots + G^2 + G^1 + G^0$. Then $P(i,j) = M_{i,j}$. Note that $M$ can be computed in $O(\log K)$ time using a version of "fast exponentiation". Finally, to compute the number of spanning trees of a graph, you can use Kirchhoff's matrixtree theorem. It says that the number of spanning trees can be obtained as the determinant of a specific matrix related to the graph. Unfortunately, the number of spanning trees can be very large, and may not fit a 64bit variable. One idea is to use big ints and fractions to compute the determinant, but this is slow. Instead, a better solution is to compute the determinant modulo ~150 primes around $10^9$, and reconstructing the full answer using the Chinese remainder theorem. (Garner's algorithm can be used for this.) EXPLANATION:Number of spaceshipsThe first step for our solution is to compute the number of spaceships between each pair of stars $(i,j)$. This is equal to the number of paths between $i$ and $j$ with length at most $K$. Let's denote this number by $P(i,j)$. We also denote by $P_k(i,j)$ the number of paths between $i$ and $j$ with length exactly $k$. Thus, we have the relationship: $$P(i,j) = P_0(i,j) + P_1(i,j) + P_2(i,j) + \ldots + P_K(i,j)$$ Now, what is $P_k(i,j)$? We can use dynamic programming to figure this one. Consider a path of length $k$ from $i$ to $j$. Let $t$ be the secondtolast node visited. Then the path consists of two parts:
Thus, the number of paths of length $k$ from $i$ to $j$ whose secondtolast node is $t$ is equal to $P_{k1}(i,t)$! Summing across all possible values of $t$, we get: $$P_k(i,j) = \sum_{\text{$t$ adjacent to $j$}} P_{k1}(i,t)$$ For the base case, we can have $k = 0$. In this case, the path is of length $0$, so $P_0(i,j) = 1$ if $i = j$, and $P_0(i,j) = 0$ if $i \not= j$. Thus, we now have a dynamic programming way to compute $P_k(i,j)$ for every $k$! Unfortunately, $K$ is very large, so this solution isn't feasible. We can restate the recurrence above in a different way. Let $G$ be the $N\times N$ adjacency matrix of the graph, i.e., $G(i,j) = 1$ if $i$ and $j$ are adjacent, and $G(i,j) = 0$ otherwise. Then the recurrence above is equivalent to: $$P_k(i,j) = \sum_{t=1}^N P_{k1}(i,t)G(t,j)$$ However, if we view each $P_k$ as an $N\times N$ matrix itself, then this is just the equation of matrix multiplication! In other words, the above is equivalent to $$P_k = P_{k1}\cdot G$$ where $\cdot$ denotes matrix multiplication. Also, based on the above, $P_0$ must be the $N\times N$ identity matrix (denoted $I_N$). Thus, we have:
More generally, we have $P_k = G^k$, where $G^k$ denotes the matrix $G$ raised to the power $k$! This is certainly an elegant formulation of the above, but more importantly, it gives us a faster way of computing $P(i,j)$. Notice that if we also view $P$ as an $N\times N$ matrix, then the relationship above is equivalent to $$\begin{align*} P &= P_0 + P_1 + P_2 + \ldots + P_K \\\ &= G^0 + G^1 + G^2 + \ldots + G^K \end{align*}$$ This sum looks very much like a geometric series, so it seems possible that a fast formula for it exists. Unfortunately, the standard formula $1 + x + x^2 + \ldots + x^k = \frac{x^{k+1}  1}{x  1}$ doesn't work, because we will have to define how to divide matrices, and even if it is possible, $G  I_N$ might not be invertible. This sucks because with such a formula, we could possibly compute the answer using only $O(\log k)$ multiplications with fast exponentiation. Thankfully, there is another way to compute this series using $O(\log k)$ multiplications without doing any division. Let's define $F(k) = G^0 + G^1 + G^2 + \ldots + G^{k1}$. Thus, $P = F(K+1)$, so we can focus on computing $F(k)$. Recursively, this can be computed as: $$\begin{align*} F(k) &= G^0 + G^1 + G^2 + \ldots + G^{k1} \\\ &= G^0 + G(G^0 + G^1 + \ldots + G^{k2}) \\\ &= I_N + G\cdot F(k1) \end{align*}$$ But the more important insight is the following relationship between $F(2l)$ and $F(l)$: $$\begin{align*} F(2l) &= G^0 + G^1 + G^2 + \ldots + G^{2l1} \\\ &= G^0 + G^1 + G^2 + \ldots + G^{l1} + G^l + G^{l+1} + \ldots + G^{2l1} \\\ &= (G^0 + G^1 + G^2 + \ldots + G^{l1}) + G^l(G^0 + G^1 + \ldots + G^{l1}) \\\ &= (I_N + G^l)(G^0 + G^1 + G^2 + \ldots + G^{l1}) \\\ &= (I_N + G^l)\cdot F(l) \end{align*}$$ This relation gives us the following "fast exponentiation" analog algorithm to compute $F(k)$:
Unfortunately, this doesn't quite run in $O(\log k)$ multiplications because we will need to compute $G^{k/2}$ every time. Computing $G^{k/2}$ using normal fast exponentiation every time will give us $O(\log^2 k)$ multiplications. But it can actually be reduced to just $O(\log k)$ by computing $G^k$ alongside $F(k)$. More specifically, instead of computing just $F(k)$, we compute $F(k)$ and $G^k$:
This now uses just $O(\log k)$ multiplications! Since naïve matrix multiplication runs in $O(N^3)$ time, the overall running time of computing $P := F(K+1)$ is then $O(N^3 \log K)$. Number of ways to retire spaceshipsAfter building the spaceships, the Empire's weapon now destroys most of them. Specifically, every group of $X$ spaceships travelling between the same pair of stars will be destroyed. This simply means you should reduce each $P(i,j)$ modulo $X$! At this point, Chef now wants to retire as many spaceships as possible such that it is still possible to get from any star to any other star. Given a multigraph, what is the maximum number of edges you can remove so that it is still connected? We can rephrase this as: Given a multigraph, what is the minimum number of edges you can keep so that the graph is connected? The latter version is much easier to answer: The answer is simply $N1$, and the remaining edges must form a tree, actually a spanning tree. Thus, the answer to this problems is the number of spanning trees of the multigraph where there are $P(i,j) \bmod X$ edges between every pair of nodes $(i,j)$! Counting spanning trees is multigraphs is a standard problem which can be solved by the wellknown Kirchhoff's matrixtree theorem. It says that the number of spanning trees can be obtained as a determinant of a specific matrix:
This is quite elegant in my opinion, and its proof is actually accessible/understandablem. A proof outline is given in its Wikipedia page. Converting this into an algorithm is quite straightforward. $L$ itself is quite easy to construct. The real challenge lies in computing the determinant. One can usually do this in $O(N^3)$ with Gaussian elimination. However, we soon encounter a few problems when doing Gaussian elimination:
This means that the running time is not just $O(N^3)$, but $O(N^5)$ (assuming $N$ and $X$ fits in a machine word), because our intermediate numbers will have $O(N)$ digits, and we will need to multiply them in $O(N^2)$ time. This might be very slow for the final subtask. I tried implementing it but I didn't get it accepted no matter how hard I tried. Maybe you were able to get this method accepted, but instead we will show you a faster way. Computing the big answerNote that the problem will be much easier if the problem only asked for the answer modulo some prime $p$, say the usual $10^9 + 7$. We can perform Gaussian elimination in the world "modulo $p$" just fine, and what we get will be the determinant of the matrix modulo $p$. This works because division by a nonzero number modulo $p$ is possible and welldefined. This is not true when the modulus is not prime. Another advantage of only asking the answer modulo $10^9 + 7$, is that we don't need bigints to compute the answer! Now, the problem asks for the full answer. But suppose we know that the answer is less than $10^9$. Note that Gaussian multiplication doesn't guarantee that intermediate values are smaller than the final answer, so you'll probably still need big ints and fractions to do it. But instead, we can take advantage of the fact that if $x < 10^9$, then $x \bmod (10^9 + 7) = x$, so we can work modulo $10^9 + 7$ instead! This saves us the need to use big ints and fractions. Note that we can adapt this method as long as we can put an upper bound to the answer. Specifically, suppose we know that the answer is less than some upper bound, $U$. Then we can select any modulus $M > U$ (not necessarily prime) and just work modulo $M$, and the answer will still be correct! Unfortunately, in the current problem, the upper bound can be very, very large. Our best upper bound so far is $7777776^{99}\cdot 100^{98}$, and it has $879$ digits. Sure, we can find a prime number with more than $879$ digits and work modulo that prime, but this is still slow, because multiplying numbers that large is slow! (And actually, the running time is still $O(N^5)$.) The solution is to instead work modulo many small primes $p_1, p_2, \ldots, p_m$ and then combine the results using Chinese remainder theorem. Specifically,
Well, this works, but we still need to outline how to do that last step, which will inevitably require bigints. We will describe one way to do it in $O(m^2)$ time later on. Now, how fast does this run? Note that we're not multiplying big numbers anymore, except in the last step, so each determinant is computed in actual $O(N^3)$ time. Thus, the running time is $O(mN^3 + m^2)$. But since the final result has $O(N)$ digits, we can select the primes such that $m = O(N)$. Thus, the whole running time is $O(N^4)$, which is at least an order of magnitude faster than $O(N^5)$! All that remains is to describe that last step. Chinese remainder theoremWe have $m$ congruence relations, where the $i$th relation is $$x \equiv r_i \pmod{p_i}$$ where the $p_i$s are distinct primes. What we want is to compute $x$ modulo $p_1\cdot p_2\cdot p_3\cdots p_m$. The Chinese remainder theorem guarantees the uniqueness of this value. The basic version of the Chinese remainder theorem involves only two congruences. Suppose we want to find $x$ modulo $m_1$ and $m_2$, given that $$x \equiv a_1 \pmod{m_1}$$ and $$x \equiv a_2 \pmod{m_2}$$ and $m_1$ and $m_2$ are coprime. Then the theorem says that: $$x \equiv a_1 n_2 m_2 + a_2 n_1 m_1 \pmod{m_1m_2}$$ where $n_2$ is the inverse of $m_2$ modulo $m_1$, and $n_1$ is the inverse of $m_1$ modulo $m_2$. For more than two congruences, repeatedly apply this version. But since we're talking about lots of primes $p_i$, this means that if we apply the theorem naïvely we will need to compute inverses of numbers modulo some big integers, and that sounds quite difficult. Fortunately, there's another way to combine congruences without this inverse computation. This is called Garner's algorithm, and we'll show how it works for the case $m = 4$, i.e. there are four congruence relations, and we want to compute $x$ modulo $p_1p_2p_3p_4$. Notice that we can write $x$ modulo $p_1p_2p_3p_4$ uniquely in the form $$x \equiv x_1 + p_1x_2 + p_1p_2x_3 + p_1p_2p_3x_4 \pmod{p_1p_2p_3p_4},$$ where $0 \le x_i < p_i$ for every $i$. Thus, our goal now is to compute these $x_i$s. We can compute them one by one:
Notice that nowhere in this solution did we need to use big ints whatsoever! We will only need big ints to compute $x$ itself from $$x \equiv x_1 + p_1x_2 + p_1p_2x_3 + p_1p_2p_3x_4 \pmod{p_1p_2p_3p_4},$$ but here, the most complex operation we need is just multiplication! We don't even need to implement fullfledged bigint multiplication, because each multiplicand is around $10^9$, so we just need to implement multiplication by a "small" number! This is the main advantage of Garner's algorithm. The running time of this algorithm is $O(m^2)$, as long as we write the Garner relation as $$x \equiv x_1 + p_1(x_2 + p_2(x_3 + p_3(x_4)))$$ Time Complexity:$O(N^3 \log K + N^4)$ AUTHOR'S AND TESTER'S SOLUTIONS:
This question is marked "community wiki".
asked 10 May '16, 05:32
