GRAPHCNT - Editorial

djdolls · May 16, 2015, 9:11pm

PROBLEM LINK:

Author: Yuyang Huang
Tester: Kevin Atienza
Editorialist: Ajay K. Verma
Russian Translator: Sergey Kulik
Mandarian Translator: Gedi Zheng

DIFFICULTY:

MEDIUM-HARD

PREREQUISITES:

Dominator, Depth First Search

PROBLEM:

Given a directed graph G = (V, E), and an entry vertex r, find the number of pairs of vertices (x, y) such that there exist paths r \longmapsto x and r \longmapsto y having no common vertex other than r.

QUICK EXPLANATION:

If two vertices x and y have a common dominator other than r, then this dominator will be present in all paths from r to x and from r to y. On the other hand, if the only common dominator of x and y is r, then we can find paths r \longmapsto x and r \longmapsto y, which share no vertex other than r. Hence, if we compute the dominator tree of G using the Lengauer and Tarjan algorithm, the task reduces to find the pair of vertices in the tree, which have a common ancestor other than r, and can be done in linear time.

EXPLANATION:

We are given a directed graph G = (V, E), and an entry vertex r. Now, for a given vertex x, consider all paths from r to x. Some vertices will occur in all these paths (e.g., r, x). These vertices are called the dominator of x. More formally, if a vertex d is present in all paths from r to x, then d is a dominator of x. The nodes r and x are the trivial dominators of x, other dominators of x are called non-trivial dominator.

It can be seen that being dominator is a transitive relationship, i.e., if x is a dominator of y, and y is a dominator of z, then x must also be a dominator of z. This is because any path r \longmapsto z must contain vertex y, however, the path from r to y must pass through x. Therefore, any path from r to z must contain the vertex x, i.e., x is a dominator of z.

On the other hand, if a vertex z has two dominators x and y, then x and y must have a dominator relationship between them. We can prove this using contradiction. Consider a path from r to z, this path must pass through both x and y, as they are dominator of z. Let us say that y occurs later in this path, i.e., there is a path \it{p} from y to z, which does not contain x. Now, if x is not a dominator of y, then there must be a path \it{q} from r to y, which does not contain x, if we concatenate the two paths \it{q} and \it{p}, we will have a path from r to z, which does not contain x. This is a contradiction, as x is a dominator of z. Hence, x must occur in every path from r to y, i.e., x is a dominator of y.

Based on these two facts, we can represent dominator relationship using a tree rooted at r. If x is a dominator of y, then there must be a path from x to y in this tree. This tree is called the dominator tree of G. The dominator of a vertex x will be all vertices lying in the path from r to x in the dominator tree. The immediate dominator of a node x is its parent in the dominator tree.

Next, we show how to check whether for two given vertices x and y, there exist paths r \longmapsto x and r \longmapsto y having no common vertex other than r.

Lemma: For two vertices x and y, there exist paths r \longmapsto x and r \longmapsto y having no common vertex other than r if and only if x and y have a common ancestor in the dominator tree other than r.
Proof: Assume that x and y have a common ancestor z \neq r in the dominator tree. This means z is a dominator of both x and y, i.e., both paths r \longmapsto x and r \longmapsto y must contain the vertex z.

On the other hand, let us assume that x and y have no common ancestor other than r. The dominators of vertex y are r = v_1, v_2, \cdots, v_k = y, in the order of distance from r in the dominator tree. Hence, any path from r to y, will be a concatenation of paths v_1 \longmapsto v_2, v_2 \longmapsto v_3, \cdots, v_{k - 1} \longmapsto v_k. There must be a path \it{p} from r to x, which does not contain the dominator v_2. Note that any vertex in the path v_2 \longmapsto v_k has v_2 as its dominator, hence the intersection of \it{p} and any path v_2 \longmapsto v_k must be empty, otherwise \it{p} would need to contain the dominator vertex v_2.

Now, there are two possible cases:

when there is a unique path from v_1 to v_2 which is the edge (v_1, v_2). In this case we have path \it{p} and v_1 \mapsto v_2 \longmapsto v_k, which do not share any vertex other than r = v_1.
In the second case, we have more than one path from v_1 to v_2. Since v_2 has no dominator other than v_1, we cannot disconnect v_1 from v_2 by removing only a single vertex other than v_1, v_2. Hence, according to Menger’s Theorem, there must be two vertex disjoint paths between v_1 and v_2 (excluding the endpoints). Let us call them \it{q_1} and \it{q_2}. If the intersection of \it{p} with any of the \it{q_1} or \it{q_2} is empty, then we already have two vertex disjoint paths \it{p} = r \longmapsto x and r \longmapsto y. Now if, \it{p} has a nonempty intersection with both \it{q_1} and \it{q_2}.

\it{p} \cap \it{q_1} = \{u_1, u_2, \cdots , u_m\}
\it{p} \cap \it{q_2} = \{w_1, w_2, \cdots, w_n\}

The two sets \{u_1, u_2, \cdots , u_m\} and \{w_1, w_2, \cdots, w_n\} are disjoint. Without loss of generality we can assume that w_n is closer to x than u_m in the path \it{p}. Let us denote the subpath r \longmapsto w_n of \it{q_2} by \it{pred}, and the subpath w_n \longmapsto x of \it{p} by \it{succ}. Now consider the following two paths:
r \longmapsto x = \text{concat} (\it{pred}, \it{succ})
r \longmapsto y = \text{concat} (\it{q_1}, v_2 \longmapsto y)

The two paths have no common vertex other than r. Hence, if x and y have no common dominator (except r), then we can find vertex disjoint paths r \longmapsto x and r \longmapsto y.

Algorithm:

Based on the above lemma, we can compute the number of bad pairs (i.e., the pairs which have no vertex disjoint paths from r), if we have the dominator tree available. This is the same as the number of pairs which have a common ancestor other than r. If the children of r in the dominator tree are v_1, v_2, \cdots v_k, and the size of the subtree rooted at these nodes are s_1, s_2, \cdots, s_k, then the number of bad pairs will be {s_1 \choose 2} + {s_2 \choose 2} + \cdots + {s_k \choose 2}. The number of good pairs will be N \choose 2 - number of bad pairs, where N is the number of vertices reachable from r in the graph G.

The sizes s_1, s_2, \cdots, s_k can be computed easily in linear time using a bottom up traversal of the tree. Hence, the only task that remains is to calculate the dominator tree of the graph.

Also, note that we do not need to compute the whole dominator tree, but only the sizes s_1, s_2, \cdots, s_k. In other words, we only need to compute the farthest ancestor (i.e., a dominator d, such that d has no non -trivial dominator other than r) of each vertex in the dominator tree. We call this node as the farthest dominator (denoted as \it{fdom}(.)) of a vertex. More formally,
\it{fdom}(r) = -1
\it{fdom}(x) = d, such that d dominates x, and d has no non-trivial dominators.

If all predecessors of x have the same farthest dominator y , then, this will also be the farthest dominator of x, otherwise the farthest dominator of x will be x itself.

We can group the nodes according to their farthest dominator, and these groups will represent the vertices in the subtrees rooted at v_1, v_2, \cdots, v_k, i.e., children of r in the dominator tree. Hence, we can compute the number of good pairs using the size of these groups.

Farthest Dominator in a DAG:

The computation of farthest dominator can be done easily by traversing the vertices in topological order (i.e., from root to leaves).

\it{fdom}(r) = -1
\it{fdom}(x) = d, if \forall y \in \text{pred}(x), \it{fdom}(y) = d
\it{fdom}(x) = x, otherwise.

This will compute the farthest dominator of all nodes in O (N + E) time.

Farthest Dominator in General Graph:

For the general graphs, we can compute all dominators of a vertex, and then pick the one which has no non-trivial dominator as its farthest dominator.

The computation of all nodes, which have x as their dominator, can be done in O (N + E) using a depth first search. The basic idea to start a depth first traversal from r, however, do not go beyond the node x. All the nodes, which are traversed, will not have x as their dominator, while all other nodes will have x as their dominator.

// In the end, S will contain all nodes which have
// x as their dominator.
S = {1, 2, ..., N};

void dfs(int curr, int x) {
  // curr is reachable, hence it cannot have x
  // as its dominator, remove it from the list.
  remove(S, curr);

  // Do not proceed after reaching x.
  if (curr == x) return;

  foreach successor y of curr {
    dfs(y, x);
  }
}

// This will compute the nodes whose dominator is x.
dfs(r, x);

This approach will compute the dominators of all nodes in O (NE) time, and will work for the smaller subtasks.

Dominator Tree of Large Graphs:

For large graphs we can use the Lengauer and Tarjan algorithm to compute the dominator tree. This approach works in O ((N + E) \log N) time, and, can be improved to work in O (N + E) time using advanced data structures. However, in our problem the O ((N + E) \log N) will suffice.

AUTHOR’S AND TESTER’S SOLUTIONS:

Author’s solution
Tester’s solution

snowbear · May 18, 2015, 7:38pm

Great problem! I was unaware of Lengauer and Tarjan algorithm but on the last day was able to finish something which turns out to be similar to it.

There was a similar problem on Codeforces, those who understand Russian might find its editorial easier to get than reading the paper mentioned above. The video editorial for CF problem is here, starting at 1:01:00

mugurelionut · May 19, 2015, 12:17am

A very simple algorithm for computing the dominator tree (which I actually thought of myself some years ago when I first encountered problems involving dominator trees) is:

Start with an initial tree rooted at 1 (e.g. a DFS or BFS tree)
While there are any updates to the tree do:

2.1) Consider every directed edge i -> j

2.2) If i is a descendant of j in the tree, then move on

2.3) Compute x = LCA(i,j) in the current tree

2.4) if x is not j’s parent then disconnect j from its current parent and make it a direct child of x

This algorithm obviously terminates at some point, since nodes only move up the tree and they can’t move beyond being a direct child of the root.

A direct implementation of this is too slow, though, and even when heavily optimized, I still got TLE on one test case (while all the others passed). The tricky part is that the tree is dynamic (nodes change their parents), so the standard LCA methods don’t work. But if you find an efficient data structure which supports the needed operations (hint: Cut a node from its parent, Link it to a different node as parent, and compute LCA between two nodes), then you’re good (of course, such a data structure exists)

tonynater · May 19, 2015, 6:32am

In reply to mugurelionut, I tried doing this method, but don’t believe it is correct for a DFS tree. It is because I don’t think the updates work independently of each other.
Consider the graph here: alt text

Let’s say the filled edges are the initial dfs tree, and the dashed lines are the updates. If we process 5->4 first and then 1->5, we will get the wrong dominator tree.

mugurelionut · May 19, 2015, 11:48am

@tonynater: You are right about the order. However, notice that my algorithm is inside a while loop: “While there are any updates to the tree”. This means you may iterate over all the edges multiple times, until finally, in one of the iterations there are no more updates. In you example, let’s say you consider the edges in the order 5->4, 1->5. In the first iteration 4 is reattached to 2 and 5 is reattached to 1. In the 2nd iteration, the edge 5->4 will change 4’s parent again, to 1. In the 3rd iteration there are no more updates, so stop.

mugurelionut · May 19, 2015, 11:55am

@tonynater: To continue my previous comment, I think there are edge orders which may force O(N) iterations of my algorithm, which would make my solution very slow. But if you sort the edges somehow (e.g. maybe edges with endpoints closer to the root first, or even shuffling them randomly), I’m pretty sure the number of iterations will be reasonably small. I’ve got no formal proof for this, though - just my intuition.