CHN06 - Editorial



Author: ???
Tester: Kevin Atienza, Jingbo Shang
Editorialist: Kevin Atienza


Graph traversal, perfect matching


Given a weighted bipartite graph with nodes \{1\ldots 2n\} with partites \{1\ldots n\} and \{n+1\ldots 2n\} and cost function \mathop{c}(u,v) (for 1 \le u \le n and n+1 \le v \le 2n), extend the graph into a complete bipartite graph so that every perfect matching has the same cost. It is guaranteed that there is a unique way to do this.

What is the sum of the squares of the edge costs, modulo 10^9 + 7?


The extended cost function must satisfy \mathop{c}(u,v) + \mathop{c}(w,x) = \mathop{c}(u,x) + \mathop{c}(w,v) for all u, v, w, x. (in the appropriate ranges). So for all u, v:

\mathop{c}(u,v) = \mathop{c}(u,n+1) + \mathop{c}(1,v) - \mathop{c}(1,n+1)

So we must find \mathop{c}(1,v) and \mathop{c}(u,n+1) for all u, v.

To get \mathop{c}(1,v) for all v:

  • Start with any known \mathop{c}(1,v_1) (which is guaranteed to exist in the input).
  • For i = 2,\ldots,n, find a node v_i such that there exists an older vertex v_j and a u adjacent to both v_i and v_j. We can now compute \mathop{c}(1,v_i) as \mathop{c}(1,v_i) = \mathop{c}(1,v_j) + \mathop{c}(u,v_i) - \mathop{c}(u,v_j).

The procedure above can be done in O(E) time using BFS/DFS on the nodes \{n+1\ldots 2n\}. A similar algorithm can be done to get \mathop{c}(u,n+1) for all u. The answer is now:

\sum_{u,v} \mathop{c}(u,v)^2 = n U_2 + 2 U_1 V_1 + n V_2


\begin{aligned} U_1 &= \sum_u [\mathop{c}(u,n+1) - \mathop{c}(1,n+1)] \\ U_2 &= \sum_u [\mathop{c}(u,n+1) - \mathop{c}(1,n+1)]^2 \\ V_1 &= \sum_v \mathop{c}(1,v) \\ V_2 &= \sum_v \mathop{c}(1,v)^2 \end{aligned}

All sums can be computed in O(n) time.



Our goal is to compute

\sum_{u,v} \mathop{c}(u,v)^2

where \mathop{c}(u,v)^2 is the extended cost function among all pairs of nodes (u,v) with 1 \le u \le n and n+1 \le v \le 2n. (Don’t forget to reduce all computations modulo 10^9 + 7.)

Let’s try to find some properties of the cost function \mathop{c}(u,v). There are a lot of matchings. In a complete bipartite graph with n nodes on each partite, there are n! perfect matchings. But many of these matchings share the same set of edges.

Let’s look at a particular matching. Let’s say that this matching pairs u to v and u' to v', along with n-2 other pairings. We can form another matching by swapping these two pairs, i.e. by pairing u to v' and u' to v, and keeping all the other n-2 pairings. These two matchings must have the same cost, so the following statement must be true:

\mathop{c}(u,v) + \mathop{c}(u',v') = \mathop{c}(u,v') + \mathop{c}(u',v)

Note that this must hold true for all nodes u,u' \in \{1\ldots n\} and v,v' \in \{n+1,\ldots 2n\}. These are necessary conditions for our graph to have the same cost for all perfect matchings.

Actually, they’re also sufficient conditions! To see why, note that you can transform any perfect matching to any other perfect matching by a series of swappings like above. If \mathop{c}(u,v) + \mathop{c}(u',v') = \mathop{c}(u,v') + \mathop{c}(u',v) is true for all nodes, then all intermediate perfect matchings have the same cost, so any two perfect matchings have the same cost.

We can view this equation in terms of rectangles in a matrix. Let’s build an n\times n matrix such that rows and columns are labelled \{1 \ldots n\} and \{n+1 \ldots 2n\}, respectively, and row u and column v contains c(u,v). Then the vertices (u,v), (u',v'), (u,v') and (u',v) are the corners of a submatrix, and the equation above says that opposite corners of the submatrix have the same sum.

This equation will help us infer the values of the missing \mathop{c}(u,v) values. If we know the other three values, then we can get \mathop{c}(u,v) as \mathop{c}(u,v') + \mathop{c}(u',v) - \mathop{c}(u',v'). Now, if we can compute all missing \mathop{c}(u,v) this way then we can construct the requested complete graph. The problem with this approach is that it takes \Omega(n^2), because there are n^2 edges to compute. Thus, we must somehow be able to compute the required sum without computing the edges themselves.

We can do this by noticing that all edge costs are uniquely identified by the values of the first row and column of the n\times n matrix:

\mathop{c}(u,v) = \mathop{c}(u,n+1) + \mathop{c}(1,v) - \mathop{c}(1,n+1)

Thus all we need is to compute \mathop{c}(u,n+1) for all u and \mathop{c}(1,v) for all v. To simplify things, we will write R_u = \mathop{c}(u,n+1) and C_v = \mathop{c}(1,v), so we have

\mathop{c}(u,v) = R_u + C_v - R_1.

We will further define D_u = R_u - R_1 so this becomes

\mathop{c}(u,v) = D_u + C_v.

Using this, we can now compute the answer quickly.

\begin{aligned} \sum_{u,v} \mathop{c}(u,v)^2 &= \sum_{u,v} \left(D_u + C_v\right)^2 \\ &= \sum_{u,v} \left(D_u^2 + 2D_uC_v + C_v^2\right) \\ &= \sum_{u,v} D_u^2 + \sum_{u,v} 2D_uC_v + \sum_{u,v} C_v^2 \\ &= n \sum_{u} D_u^2 + 2\left[\sum_{u} D_u\right]\left[\sum_{v} C_v\right] + n \sum_{v} C_v^2 \end{aligned}

All these sums can be computed in O(n) time each. So we now focus on computing all $R_u$s and $C_v$s.

Computing all $C_v$s and $R_u$s

Our goal is to compute C_v for all n+1 \le v \le 2n. First, we will rearrange

\mathop{c}(u,v) + \mathop{c}(u',v') = \mathop{c}(u,v') + \mathop{c}(u',v)

to get

\mathop{c}(u,v) - \mathop{c}(u,v') = \mathop{c}(u',v) - \mathop{c}(u',v')

What this means is that for every two columns v and v', the difference between two elements in the same row is constant.

This fact is very useful. Let’s call this common difference \Delta[v,v']. This function has the following nice property (for any three columns v_1, v_2 and v_3):

\Delta[v_1,v_3] = \Delta[v_1,v_2] + \Delta[v_2,v_3].

This equation allows us to infer \Delta values between two columns based on existing ones. Namely, suppose we want to know \Delta[v_1,v_k]. If we can find a sequence of columns v_1, v_2, \ldots, v_k such that we know the \Delta for every two consecutive columns, then we can compute \Delta[v_1,v_k] as simply \Delta[v_1,v_2] + \Delta[v_2,v_3] + \ldots + \Delta[v_{k-1},v_k]!

We can view the \Delta[v,v'] relationships as a graph itself, say G. G is a graph with nodes \{n+1\ldots 2n\} and initially no edges. We we gradually add new edges (v,v') as we happen to discover new values \Delta[v,v']. From G, we can infer a value \Delta[v,v'] even if there is no edge from v to v', as long as there is a path from v to v'. Thus, if all nodes of G become connected, then we can infer every value of \Delta[v,v']!

To compute the $C_v$s themselves, we start from a C_v (= \mathop{c}(1,v)) that we already know. (Such a value must exist, otherwise the uniqueness guarantee is violated.) Then for any other column v', we can use the formula
C_{v'} = C_v + \Delta[v,v']. If G is connected, then all $C_v$s can be computed this way.

Checking connectivity is easy, as long as we start with a few $\Delta[v,v’]$s that we already know. But we can easily find such values. For example, let’s fix a row u, and let v_1,v_2,\ldots, v_k be the columns such that \mathop{c}(u,v_i) is known for all 1 \le i \le k. Then we can easily compute \Delta[v_i,v_j] for every pair 1 \le i, j \le k by taking the appropriate differences!

So our algorithm now to compute the $C_v$s would be:

  • Initialize G to be a graph on nodes \{n+1,\ldots 2n\} with no edges.
  • Find a known C_v (= \mathop{c}(1,v)), and call this v_{\text{start}}. It is guaranteed to exist.
  • For every u \in \{1\ldots,n\}, let v_1,v_2,\ldots, v_k be the columns such that \mathop{c}(u,v_j) is known. Then for every 1 \le j \le k, add the edge (v_1,v_j) in G, with weight \mathop{c}(u,v_j) - \mathop{c}(u,v_1).
  • Start a BFS/DFS from v_{\text{start}}. For each newly visited node v', we can then infer the value C_{v'} using the formula C_{v'} = C_v + \Delta[v,v'], where v is the parent of v' in the traversal.

Clearly this runs in O(E) time, and outputs the correct C_v values! (Note that in the third step, we only add the edges (v_1,v_j) and not (v_i,v_j) for other i, otherwise there might be around k^2/2 edges to add, which is \Theta(n^2) in the worst case.)

Computing all $R_u$s is similar and is symmetric to the C_v case. You just swap the roles of rows and columns. Thus, the overall algorithm runs in O(E)!

Note that you can also use a disjoint set forest to determine the $C_v$s. You just have to augment each number so that it stores the the \Delta value of itself with its parent in the disjoint set forest, and update it when needed.

Time Complexity:




[333]: The link is provided by admins after the contest ends and the solutions are uploaded on the CodeChef Server.
[444]: The link is provided by admins after the contest ends and the solutions are uploaded on the CodeChef Server.
[555]: The link is provided by admins after the contest ends and the solutions are uploaded on the CodeChef Server.