BUGATACK - Editorial

kevinsogo · June 21, 2015, 10:13pm

PROBLEM LINK:

Author: Utkarsh Lath
Tester: Kevin Atienza
Editorialist: Kevin Atienza

PREREQUISITES:

Dynamic programming, combinatorics, graphs, Floyd-Warshall algorithm

PROBLEM:

The Floyd-Warshall algorithm can compute the transitive closure of a graph using N passes of a certain augmentation method. Suppose we only run the first N-K such passes. What is the number of simple, undirected, unweighted graphs where the transitive closure is still computed correctly?

QUICK EXPLANATION:

After the k th pass of Floyd-Warshall, A[i][j] contains 1 if i and j are connected via a path whose intermediate nodes have indices \le k. Therefore, the question is simply: how many simple, undirected, unweighted graphs are there such that connectivity can be determined using only the first N-K vertices as intermediate nodes?

Let’s call a node whose index is \le N-K a good node, and > N-K a bad node. We can show that the following are necessary for the transitive closure to be computed using the first N-K passes:

Consider a bad node x that is not connected to any good node. Then any other bad node adjacent to x is also not adjacent to a good node. Furthermore, for any two bad nodes y and z adjacent to x, y and z are also adjacent to each other. In other words, all nodes connected to x form a complete graph of bad nodes.
Consider a bad node x that is connected to some good node. Then for any two good nodes r and s connected to x, there is a path from r to s only passing through good nodes. In other words, all good nodes connected to x form a connected graph of good nodes.
Consider a bad node x that is connected to some good node, and let S be the set of all good nodes connected to x (by the above, the nodes in S form a connected graph). If y is another bad node that is connected to x, then y must be connected to some node in S.

Amazingly, these three conditions are sufficient! Therefore, we can count the graphs satisfying the above two properties to come up with the following counting formula:

F(N,K) = \sum_{b=1}^K {K-1 \choose b-1} F(N-b,K-b) + \sum_{b=1}^K \sum_{g=1}^{N-K} {N-K \choose g} {K-1 \choose b-1} F(N-g-b,K-b) C_g (2^g-1)^b 2^{b(b-1)/2}

with base case

F(N,0) = 2^{N(N-1)/2}

The meanings of certain things in this formula are:

F(N,K) is the number of solutions.
C_g is the number of connected graphs of size g.

EXPLANATION:

For a given graph, the Floyd-Warshall algorithm computes the shortest path between any pair of nodes in O(N^3) time, where N is the number of nodes. This can be modified to produce the transitive closure of the graph, and in fact this is the “Floyd-Warshall algorithm” presented in the problem. The transitive closure of a graph is the graph with the same set of nodes for which there is an edge from i to j if and only if there is a path from i to j in the original graph. In terms of undirected graph, this is true if and only if i and j are connected.

The question is: assuming we run only the first N-K passes of the algorithm, how many simple undirected graphs of N labelled nodes are there such that the modified algorithm still returns the transitive closure? In order to answer the question, we must understand how exactly this algorithm works. The following is the pseudocode of the algorithm:

for (int p = 1; p <= N; p++) {
	for (int i = 1; i <= N; i++) {
		for (int j = 1; j <= N; j++) {
			conn[i][j] = conn[i][j] || (i != j && conn[i][p] && conn[p][j]);
		}
	}
}

The algorithm works in N passes. After the p th pass, conn[i][j] is true if and only if i \not= j and there is a path from i to j whose intermediate nodes are in \{1, 2, \ldots, p\} (before the first pass, conn[i][j] is true if and only if there is an edge between i and j). This means that after N passes, conn[i][j] will be true if and only if i \not= j and there is a path from i to j. It should be intuitive how the inner two loops compute conn[i][j] properly.

Now, what happens if we only run the first N-K passes? This means that we are only allowing the nodes \{1, 2, \ldots, N-K\} to be intermediate nodes. Let’s call these the good nodes, and the nodes \{N-K+1, \ldots, N\} the bad nodes. If we want the modified algorithm to still correctly compute the transitive closure, we need to ensure that *there is no pair of nodes (i,j) such that all paths from i to j pass through some bad node. This means that, among other things, we must ensure that the following holds in our graph:

Consider a bad node x that is not connected to any good node. Then any other bad node adjacent to x is also not adjacent to a good node. Furthermore, for any two bad nodes y and z adjacent to x, y and z are also adjacent to each other (otherwise the algorithm doesn’t give the correct result for the pair (y,z)). In other words, all nodes connected to x form a complete graph of bad nodes.
Consider a bad node x that is connected to some good node. Then for any two good nodes r and s connected to x, there is a path from r to s only passing through good nodes (otherwise the algorithm doesn’t give the correct result for the pair (r,s)). In other words, all good nodes connected to x form a connected graph of good nodes.
Consider a bad node x that is connected to some good node, and let S be the set of all good nodes connected to x (by the above, the nodes in S form a connected graph). If y is another bad node that is connected to x, then y must be connected to some node in S (otherwise, the algorithm doesn’t give the correct result for the pair (y,s) for any node s in S).

Amazingly, the things we mentioned above exhaust all possibilities for pairs in which the modified algorithm returns the wrong answer! In other words, the answer is the number of simple undirected graphs satisfying all the above properties. The general picture is described in the following:

The green and red nodes are the good and bad nodes, respectively. The bad nodes are grouped into two types: those connected to some good node and those not connected to any good node. Notice that in the first type, the bad nodes form a bunch of complete graphs, while in the second type, they are attached to a single set of connected good nodes.

This can be apparent after a while of thinking. We can also prove it formally:

Proof:

Let’s call a graph friendly if the modified “Floyd-Warshall algorithm” (the one run only for the first N-K passes) correctly returns the transitive closure.

It is apparent above that the above properties are necessary for a graph to be friendly, so we only have to prove that they are also sufficient. To do this, we have to show that any pair of nodes (a,b) for which the modified algorithm returns an incorrect answer violates one of the properties above.

Let’s consider first the case that both a and b are good nodes. In this case, all paths from a to b must pass through some bad node, which violates the second property above. If a is good and b is bad (or vice versa), then all paths from a to b pass through some other bad node. This means that b is not adjacent to any good node connected to a, violating the third property above.

The only remaining case is when both a and b are bad nodes. There are two cases: whether there is a good node connected to a or not. If there is any good node connected to a, then in order for the modified algorithm to be wrong, any path from a to b must pass through some other bad node. But this means that the good nodes connected to a do not form a connected graph, violating the second property above. If there aren’t any good nodes connected to a, then the path from a to b must consist entirely of bad nodes, but if the first property above must hold, then you can conclude that there must be an edge from a to b, which is a contradiction. Therefore, the first property must be violated.

This is great, since we know what the graphs we are counting look like! The properties above make it simple enough to count the graphs: Let F(N,K) be the solution for a given (N,K) pair. Then we have:

Consider the smallest-indexed bad node, say x. Suppose there are g good nodes and b bad nodes that are connected to x (0 \le g \le N-K, 1 \le b \le K). Note that there are {N-K \choose g}{K-1 \choose b-1} ways to select all of these nodes, and there are (recursively) F(N-g-b,K-b) ways to select the edges for the remaining nodes.

If g = 0, then there are no good nodes, and by the first property, the b bad nodes must form a complete graph. There is only one way to build that complete graph on b nodes.

If g > 0, then the last two properties apply. First, we must ensure that the g good nodes form a connected graph by themselves. Let C_g be this number, i.e. the number of simple undirected connected graphs on g labelled nodes. Next, we must ensure that each of the b bad nodes is connected to at least one of the g good nodes. There are 2^g-1 ways to select a nonempty subset of the good nodes, so there are (2^g-1)^b choices for all bad nodes. Finally, we can freely add edges between the bad nodes, and there are 2^{b(b-1)/2} ways to choose that. Therefore, we have the following recurrence for F(N,K):

F(N,K) = \sum_{b=1}^K {K-1 \choose b-1} F(N-b,K-b) + \sum_{b=1}^K \sum_{g=1}^{N-K} {N-K \choose g} {K-1 \choose b-1} F(N-g-b,K-b) C_g (2^g-1)^b 2^{b(b-1)/2}

For the base case K = 0 (i.e. there are no bad nodes), all graphs are counted, therefore:

F(N,0) = 2^{N(N-1)/2}

Precomputing all the F(n,k)'s for k \le n \le N that fit within the bounds of the problem takes O(N^4) time!

Finally, we need to discuss how to compute C_g, the number of connected graphs with g nodes. First, note that there are 2^{g(g-1)/2} graphs on g nodes. Let’s consider the connected component that node g belongs to. Assume that this connected component contains k nodes (1 \le k \le g). How many such graphs are there? There are {g-1 \choose k-1} ways to choose these nodes. There are also C_k ways to choose the edges among the k nodes so that they’re connected, and there are 2^{(g-k)(g-k-1)/2} ways to select the edges among the remaining nodes. Therefore, we have the following equality:

2^{g(g-1)/2} = \sum_{k=1}^g {g-1 \choose k-1} C_k 2^{(g-k)(g-k-1)/2}

By a simple rearrangement, we get the following recurrence for C_g:

C_g = 2^{g(g-1)/2} - \sum_{k=1}^{g-1} {g-1 \choose k-1} C_k 2^{(g-k)(g-k-1)/2}

Thus all the C_g's for g \le N can be computed in O(N^2) time.

Time Complexity:

O(N^4) preprocessing

AUTHOR’S AND TESTER’S SOLUTIONS:

To be uploaded soon.