PROBLEM LINK:Author: Gaoyuan Chen DIFFICULTY:MEDIUMHARD PREREQUISITES:Hall's Marriage Theorem, Linear Programming, Total Unimodularity PROBLEM:Given an undirected graph $G = (V, E)$, find the maximum value of $(\ S \  \ N(S) \)$, where $S$ is an independent set of $G$, and $N(S)$ denotes the neighborhood of $S$. QUICK EXPLANATION:The condition of $S$ being an independent set can be relaxed without affecting the answer (see Lemma 1 below). Hence, the problem reduces to finding the maximum value of $(\ S \  \ N(S) \)$, where $S$ varies over all subsets of $V$. This can be solved either by using Hall's marriage theorem (which reduces the problem to bipartite matching), or by formulating the problem as an integer linear programming, and then use the total unimodularity of the formulated problem (which means the integer constraint can be relaxed, and any linear programming solver can be used). EXPLANATION:We are given an undirected graph $G = (V, E)$ with $N$ vertices and $M$ edges. The objective is to find an independent subset $S$ of $G$, for which the difference between its size and size of its neighborhood is maximum. For a given subset $S$, we can define two values $x_v$ and $y_v$ for each vertex $v$ in the following way: $x_v = 1$, if $v \in S$, and $0$ otherwise. In other words Note that, Since $S$ must be an independent set, hence there should be no vertex $v$ for which both $x_v$ and $y_v$ are $1$ (i.e., $x_v + y_v \leq 1$). Therefore, the task is to pick values $x_1, x_2, \ldots, x_N$ from the set $\{0, 1\}$, such that the constraint about $S$ being an independent set is satisfied, and the value of the objective function $\text{obj} (S) = (x_1 + x_2 + \cdots + x_N)  (y_1 + y_2 + \cdots + y_N)$ is maximum. For smaller value of $N$ this can be done by considering all possible assignments, and pick the one which gives the best value of objective function, as shown below. int best = 0; // iterate through all subsets of V. for (int S = 0; S < (1 << N); ++S) { int x[N] = {0}; int y[N] = {0}; for (int j = 0; j < N; ++j) if (S & (1 << j)) x[j] = 1; for each edge e = (u, v) { y[u] = max(y[u], x[v]); y[v] = max(y[v], x[u]); } int obj = 0; for (int j = 0; j < N; ++j) { // S is not an independent set. if (x[j] == 1 && y[j] == 1) { obj = 1; break; } obj += x[j]  y[j]; } best = max(best, obj); } The complexity of the above code is $O (E2^N)$, and should be enough for the smaller task. One could also reduce the complexity to $O (N2^N)$ by using bitmasks. Next, we discuss how to solve the problem for larger graphs. First we prove that the constraint of S being an independent set can be relaxed without affecting the answer. Lemma 1: For each set $S$, there exist an independent set $S1$, such that $(\ S \  \ N(S) \) \leq (\ S1 \  \ N(S1) \)$.
Proof: We can compute the $x$ and $y$ values for the given set $S$ as described above. The vertices of set $S$, can be split into two sets $A$ and $B$ as follows: Since all neighbors of $A$ lie outside $S$, there cannot be any edge between the vertices in $A$ and $B$, also there would be no edge between two vertices of $A$ (i.e., $A$ is an independent set). Now, if we shrink the set $S$ to $S1 = A$, and compute the new $x'$ and $y'$ values corresponding to this set, we can observe the following: $v \in A \implies (x'_v = 1, y'_v = 0)$ // $S1 = A$ is an independent set This means $(x_v  y_v) \leq (x'_v  y'_v)$, holds for all vertices $v$. Hence, the objective function for this new set $S1$ should be at least as large that of $S$. This completes the proof of the lemma. Based of the lemma, we can relax the constraint of $S$ being an independent set, and just find the maximum value of $(\ S \  \ N(S) \)$. If the maximum value is obtained for a nonindependent set, we can shrink it to an independent set, with the same value of objective function. Next, we present two ways to solve the problem: The first is based on Hall's marriage theorem, and the second is based on linear programming. Hall's Marriage Theorem based Approach:Let us make a bipartite graph $H = (L, R, E)$, where each side contains $N$ vertices: left side containing $(u_1, u_2, \ldots, u_N)$ and right side containing $(v_1, v_2, \ldots v_N)$. For any edge $(a, b)$ in the original graph $G$, we add edges $(u_a, v_b)$ and $(u_b, v_a)$ in the new graph. For any subset $S$ of $L$, the size of $N(S)$ is the same as the size of the neighborhood of the corresponding set in $G$, hence our task reduces to finding a subset $S$ of $L$, for which $(\ S \  \ N(S) \)$ is maximum. Hall's marriage theorem states that in a bipartite graph, the size of the maximum matching is the same as the number of vertices on left side if and only each subset of the vertices on the left side has a neighborhood, whose size is at least as large as this subset, i.e., $(\ S \ \leq \ N(S) \)$, for each subset $S$ of $L$. This means that if we have a perfect matching in $H$, then $(\ S \ \leq \ N(S) \)$ will hold for each subset of $L$, hence, the maximum value of $(\ S \  \ N(S) \)$ will be zero. In fact, we can use a generalized form of Hall’s theorem which establishes a relationship between the size of matching matching of $H$, and the maximum value of $(\ S \  \ N(S) \)$. Lemma 2: In a bipartite graph with $N$ vertices on the left side, the maximum value of $(\ S \  \ N(S) \)$, $S$ being a subset of left side vertices, will be $(N  k)$, where $k$ is the size of the maximum matching (Theorem 6.3 in this paper)
Proof:
We first prove that the $(\ S \  \ N(S) \) \leq (N  k)$. On the other hand, if the maximum value of $(\ S \  \ N(S) \)$ is $x$, then we add $x$ vertices on the right side and connect them with each vertex of the left side, which will increase the neighborhood of each subset $S$ by $x$, i.e., in the new graph each subset of the left side vertices will have a neighborhood at least as large the subset. This means the new graph must have a matching of size of $N$ according to Hall’s theorem. Since, we have added only $x$ vertices, we can increase the size of maximum matching by at most $x$, i.e., $N \leq (k + x)$ (equivalently, $(N  k) \leq x$). This completes the proof of the lemma. This means, in order to solve our problem, we only need to compute the size of maximum matching in the graph $H$, which can be done in $O (EN^{0.5})$ time using HopcroftKarp algorithm. Linear Programming based Approach:In this section, we discuss an approach based on linear programming, which is slower than the maximum matching based solution, but still runs in polynomial time, and will fit in time for the given constraints. Let us formulate our problem in terms of linear constraints. We have discussed in the first section that the problem is equivalent to picking $x_1, x_2, \cdots, x_N$ and $y_1, y_2, \cdots, y_N$ from the set $\{0, 1\}$, such that the following constraints are satisfied: If vertex $i$ has neighbors $j_1, j_2, \cdots, j_k$, then We can write this in terms of linear constraints as follows: We also have boundary constraints on the value of $x_i$’s Note that, the optimization function will decrease if we increase the value of $y_i$’s, hence, we can remove the constraint So, our Integer Linear programming formulation looks as follows: We have $2 (N + M)$ linear constraints, and an extra constraint that all $x_i$ and $y_i$’s must be integers. The extra constraint makes the problem hard. Relaxation of Integer Constraints:Some instances of integer linear programming satisfy the property of being totally unimodular, which allows us to relax the integer constraint without affecting the answer. In fact, it can be proved that our linear programming formulation is totally unimodular using the criteria mentioned in this paper, however, we provide a simpler proof indicating why integer constraints can be relaxed in our linear programming formulation. Lemma 3: Any solution of the above mentioned linear programming can be converted to a solution, where integer constraints are satisfied, and the value of the objective function remains the same. Pick the $x_i$'s which take fractional values, and sort them. Let us say that $x_{i_1}, x_{i_2}, \cdots, x_{i_m}$ are the ones with largest fractional value $f$. Now find the neighbors of vertices $i_1, i_2, \cdots, i_m$, which do not have any neighbor, whose $x$ is assigned value $1$. Let us say that these neighbors are $j_1, j_2, \cdots, j_n$. $x_{i_1} = x_{i_2} = \cdots = x_{i_m} = f$ The second equality holds because each of the $j_k$ is connected to at least one the picked $i_l$ vertex, but to none of the vertex with $x$ value $1$. We can observe that we can increase of decrease the value of $f$ slightly (i.e., $f \pm \epsilon$), and all the constraints will still be satisfied. The contribution of these $x_i$ and $y_j$'s values in the objective function is $(m  n) f$. Now, if $(m > n)$, then we increase the value of $f$ to $f + \epsilon$, without affecting any constraints but increasing the value of the objective function. On the other hand, if $(m < n)$, then we decrease the value of $f$ to $f  \epsilon$, which will increase the objective function. Since, we have the optimal solution of the linear programming, hence none of two cases should happen, i.e., $(m = n)$. In this case, we can increase the value of $f$ to $1$, without affecting the constraints or objective value. The same process can be repeated to convert the remaining fractional values to integer ones without affecting the value of objective function. This completes the proof of the lemma. The implication of the above lemma is that we can now use any linear programming solver, e.g., Simplex method, to solve our problem. Reduction in the Number of Constraints:The current formulation has $2N$ variables and $2 (N + M)$ constraints, which we can be reduced into half as many variables and constraints. Suppose we found a solution of the linear programming. We can partition the vertices into $4$ sets $A, B, C, D$ according to their $x$ and $y$ values. $v \in A \iff (x_v = 1, y_v = 0)$ According to Lemma $1$, we can remove the set $B$ by assigning $x_v = y_v = 0$ in this set. So now, we have only sets $A$, $C$, and $D$. Note that, the set $C$ corresponds to those vertices, which lie outside our chosen set, and also do not have any neighbors in the chosen set. We can actually change the values of $x_v$ and $y_v$ of this set to $0.5$ without affecting any constraint (This is because all neighbors of this set will have $y$ value either $0.5$, or $1$, which is no less than $x$ value of any vertex of the set). Hence, $v \in A \iff (x_v = 1, y_v = 0)$ However, now we have an additional property in the solution that $(x_v + y_v) = 1$, for each vertex. This means any solution of our linear programming solution can be converted into one satisfying this property. In other words, we can add this extra constraints without affecting the value of objective function. The implication of this additional constraint is that we can remove the variable $y_i$'s and replace them by $(1  x_i)$'s. After the replacement, we get the following instance. This formulation has only $N$ variables and $2 N + M$ constraints. Also one can notice that this formulation is the same as the one used for computing the maximum independent set of a graph. AUTHOR'S AND TESTER'S SOLUTIONS:
This question is marked "community wiki".
asked 11 May '15, 01:47

My solution was: 1) Check whether any vertex has exactly 0 or 1 neighbour. 2) If there exist 0 neighbour, increment the answer by 1 and remove the vertex from consideration. 3) If 1 neighbour, remove the two vertices(said vertex and its neighbour) from consideration. 4) Repeat these steps until either all vertices have been removed or no vertices with exactly 0 or 1 neighbours can be found. Return answer. This gave correct answer for 8 out of the 10 test cases but failed for the last two. Can anyone help? answered 18 May '15, 15:49

@d_taxman: Your approach will fail, for example, for the graph containing 5 nodes (1..5) that has following chains of edges: 125, 135, 145. Your algorithm choose no nodes to be in SEZ, as all nodes have at least 2 neighbours, but the coorect solution is to choose nodes 2, 3, 4 to be in SEZ, that gives happiness equal to +1. answered 18 May '15, 16:25

My approach is same as d_taxman but instead of checking only exactly 0 and 1 neighbour, i did something different. But still it gives wrong answer on last test case. My approach is as follow: first insert all vertices with its degree in a set.After that repeat while loop until set becomes empty. first of all get the first element from set(which is obvious having lowest degree) and increase x by 1 and mark all its neighbour(with increase y by one each time) and also marked popped node. Then for each unmarked node which is neighbour of popped node neighbour, decreases its degree by one each time and reinsert it in to set. And each time i take maximum of xy. My code is  vector<int> V[505]; int deg[505]; bool chk[505]; int n,m; int main() { int u,v; scanf("%d %d",&n,&m); for(int i=0;i<m;i++) { scanf("%d %d",&u,&v); V[u].push_back(v); V[v].push_back(u); deg[u]++; deg[v]++; }
} please tell me where am I wrong and if possible give testcases on which my code fails. answered 19 May '15, 11:59

@asvikr Your approach will fail, for example, for the graph containing 5 nodes (1..5) that has following chains of edges: 125, 135, 145. Your algorithm choose no nodes to be in SEZ, as all nodes have at least 2 neighbours, but the coorect solution is to choose nodes 2, 3, 4 to be in SEZ, that gives happiness equal to +1. answered 19 May '15, 14:39

Fantastic problem and editorial! For those wondering, this: "Also one can notice that this formulation is the same as the one used for computing the maximum independent set of a graph." doesn't mean that the problem is equivalent to Maximum Independent Set (that would be impossible because MIS is NPhard). The reason is that MIS contains another requirement: the solution must be integer; and this property is lost in the last step, when we go from 2N variables to N variables (an optimal solution can now have values equal to 0.5). So this shows that relaxed linear programming problem corresponding to MIS is equivalent to maximum bipartite matching. Is this a new result? answered 20 May '15, 23:44
