UNRCOST - Editorial

PROBLEM LINK:

Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4

Author: gunpoint_88
Tester: jay_1048576
Preparer & Editorialist: iceknight1093

DIFFICULTY:

2753

PREREQUISITES:

Finding SCCs, binary search

PROBLEM:

You’re given a directed graph with N vertices and M edges.
The cost of a subset of vertices is defined to be the number of pairs of vertices in the subset that aren’t mutually reachable from each other.

Answer Q queries.
For each query, given K, find the maximum size of a subset whose cost doesn’t exceed K.

EXPLANATION:

The cost of a subset S is the number of pairs of vertices in the subset that aren’t mutually reachable from each other.
Turning that around, if S has size M and there are x pairs of vertices in S that are mutually reachable from each other, then cost(S) = \binom{M}{2} - x.

The latter condition is a bit simpler to deal with because it’s more familiar.
In particular, we know that in a directed graph, two vertices are mutually reachable from each other if and only if they lie in the same strongly connected component (henceforth, SCC).
This hints to us that we should look at the SCCs of the given graph.

In particular, notice that we can take all the vertices from any single SCC, for zero cost.
However, if we have vertices from different SCCs, the cost will always be positive—intuitively, it seems better to not choose vertices from too many different SCCs, because the cost is just the number of pairs of vertices belonging to different SCCs.

Notice this in fact tells us the answer for K = 0: the size of the largest SCC.
Once again, this is a hint that the sizes of the SCCs are also important, not just the SCCs themselves.

So, let’s first find all the SCCs of the given graph, and also their sizes.
This can be done in \mathcal{O}(N + M) time using some different algorithms: for example, Kosaraju’s algorithm or Tarjan’s algorithm.

Suppose there are r SCCs, denoted C_1, C_2, \ldots, C_r.
Let their sizes be s_1, s_2, \ldots, s_r; and for convenience, let’s have s_i \geq s_{i+1}.

Our look at K = 0 above gives us a natural-sounding greedy algorithm:

  • First, take all the vertices belonging to C_1, the largest SCC, for a cost of 0.
  • Then, take as many vertices as we can from C_2, as long as the cost doesn’t exceed K.
  • Then, do the same to C_3, C_4, \ldots in order.

That is, take vertices from SCCs in descending order of size.

It turns out that this greedy is indeed optimal!

Proof

The proof of this greedy relies on a couple of simpler claims.

Consider a solution set S.
We’ll call an SCC full if all its vertices are in S, and empty if none of its vertices are.

Lemma: There exists a solution such that at most one SCC is neither full nor empty.
Proof: This is not hard to see.
Suppose there are two non-full, non-empty SCCs with respect to S.
Say one of them has x_1 vertices chosen from it, and the other has x_2, where x_1 \leq x_2.
Choosing (x_1-1, x_2+1) vertices instead gives us a strictly smaller score for the same number of chosen vertices.
Repeat this process as long as you can; it will terminate after a finite number of steps.
Once the process terminates, we’re left with at most one non-full, non-empty SCC, as claimed.

Lemma 2: The full SCCs will form a prefix of the SCC sorted by size (in descending order).
Proof: Once again, this isn’t hard to prove.
Suppose an SCC with size s_1 is full and an SCC with size s_2 is not full, where s_1 \lt s_2.
Suppose x_2 \lt s_2 vertices are chosen from the second SCC.
Then, there are some cases:

  • If x_2 = 0, move all chosen vertices to the second SCC instead.
    This doesn’t change the answer, but does ensure that the set of full SCCs is closer to being a prefix.
  • If 1 \leq x_2 \leq s_1, just swap the number of vertices chosen from both SCCs.
    This doesn’t change the answer at all; but now there are two non-full SCCs, and the construction from the previous lemma will make the smaller one empty first.
    In particular, once again we’re closer to the full SCCs forming a prefix.
  • If x_2 \gt s_1, it can be seen that moving one vertex from the first SCC to the second reduces the score, so keep doing this as long as possible.
    EIther the smaller SCC becomes empty, or the bigger one becomes full - either one is good for us.

Finally, we have some vertices from a non-full SCC.
It’s obviously optimal to take them all from a single SCC; and it doesn’t quite matter which one is chosen here so without loss of generality, we can just take them from the largest remaining one.

This proves the greedy’s optimality.


Since there are multiple queries, we need to be able to perform this greedy algorithm quickly.
To do that, notice that our solution has a somewhat predictable structure:

  • First, we take all vertices from SCCs C_1, C_2, \ldots, C_l for some l.
  • Then, we take some (but not all) vertices from C_{l+1}

Let’s compute for each part separately.

First, we need to find the largest l so that the cost of taking the SCCs C_1, C_2, \ldots, C_l doesn’t exceed k.
The cost of this choice is the number of pairs of vertices belonging to different SCCs.
Since we know sizes already, this is easily seen to be \sum_{i=1}^l \sum_{j=i+1}^l s_is_j.
This cost can in fact be precomputed using prefix sums, as follows:

  • Let P_i denote the cost of taking the first i SCCs.
  • Then, P_i = P_{i-1} + s_i\cdot (s_1 + s_2 + \ldots + s_{i-1}).
  • So, by maintaining prefix sums of the P and s arrays, all costs can be computed in \mathcal{O}(r) time.

Once costs are precomputed, finding the largest l is easy: just binary search on these costs to find the largest l such that P_l \leq K.

Next, we can still take some vertices from C_{l+1} (if it exists).
We’ve already incurred a cost of P_l so far, so we’re left with K - P_l to work with.
Note that each new vertex we take increases the cost by exactly s_1 + s_2 + \ldots + s_l.
So, the number of vertices we can take is exactly \left \lfloor \frac{K - P_l}{s_1 + s_2 + \ldots + s_l} \right\rfloor.

That’s the entire solution!
After some precomputation, each query is answered in \mathcal{O}(\log N) time since we only perform a single binary search.

TIME COMPLEXITY

\mathcal{O}(N\log N + M + Q\log N) per testcase.

CODE:

Tester's code (C++)
/*...................................................................*
 *............___..................___.....____...______......___....*
 *.../|....../...\........./|...../...\...|.............|..../...\...*
 *../.|...../.....\......./.|....|.....|..|.............|.../........*
 *....|....|.......|...../..|....|.....|..|............/...|.........*
 *....|....|.......|..../...|.....\___/...|___......../....|..___....*
 *....|....|.......|.../....|...../...\.......\....../.....|./...\...*
 *....|....|.......|../_____|__..|.....|.......|..../......|/.....\..*
 *....|.....\...../.........|....|.....|.......|.../........\...../..*
 *..__|__....\___/..........|.....\___/...\___/.../..........\___/...*
 *...................................................................*
 */
 
#include <bits/stdc++.h>
using namespace std;
#define int long long
#define INF 1000000000000000000
#define MOD 1000000007

void dfs1(vector<int> adj[],bool used[],vector<int> &order,int u) 
{
    used[u] = true;
    for(auto v:adj[u])
        if(!used[v])
            dfs1(adj,used,order,v);
    order.push_back(u);
}

int dfs2(vector<int> adjt[],bool used[],int u) 
{
    used[u] = true;
    int res = 1;
    for(auto v:adjt[u])
        if(!used[v])
            res += dfs2(adjt,used,v);
    return res;
}

void solve(int tc)
{
    int n,m;
    cin >> n >> m;
    vector<int> adj[n],adjt[n];
    for(int i=0;i<m;i++)
    {
        int u,v;
        cin >> u >> v;
        u--,v--;
        adj[u].push_back(v);
        adjt[v].push_back(u);
    }
    bool used[n];
    vector<int> order;
    memset(used,false,sizeof(used));
    for(int i=0;i<n;i++)
        if(!used[i])
            dfs1(adj,used,order,i);
    memset(used,false,sizeof(used));
    reverse(order.begin(), order.end());
    vector<int> components;
    for(auto u:order)
        if(!used[u]) 
            components.push_back(dfs2(adjt,used,u));
    sort(components.begin(),components.end());
    reverse(components.begin(),components.end());
    int sz = components.size();
    int sum[sz],pre[sz];
    sum[0]=components[0];
    pre[0]=0;
    for(int i=1;i<sz;i++)
    {
        pre[i]=pre[i-1]+components[i]*sum[i-1];
        sum[i]=sum[i-1]+components[i];
    }
    int q;
    cin >> q;
    while(q--)
    {
        int k;
        cin >> k;
        int j = upper_bound(pre,pre+sz,k)-pre-1;
        int ans = min(n,sum[j]+(k-pre[j])/sum[j]);
        cout << ans << '\n';
    }
}

int32_t main()
{
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);
    cout.tie(NULL);
    int tc=1;
    // cin >> tc;
    for(int ttc=1;ttc<=tc;ttc++)
        solve(ttc);
    return 0;
}
Editorialist's code (C++)
// #pragma GCC optimize("O3,unroll-loops")
// #pragma GCC target("avx2,bmi,bmi2,lzcnt,popcnt")
#include "bits/stdc++.h"
using namespace std;
using ll = long long int;
mt19937_64 rng(chrono::high_resolution_clock::now().time_since_epoch().count());

// SCC - kactl
// scc(graph, [&] (auto &v) {...})
vector<int> val, comp, z, cont;
vector<int> sizes;
int Time, ncomps;
int dfs(int j, auto& g, auto &f) {
	int low = val[j] = ++Time, x; z.push_back(j);
	for (auto e : g[j]) if (comp[e] < 0)
		low = min(low, val[e] ?: dfs(e,g,f));

	if (low == val[j]) {
		do {
			x = z.back(); z.pop_back();
			comp[x] = ncomps;
			cont.push_back(x);
		} while (x != j);
		f(cont); cont.clear();
		ncomps++;
	}
	return val[j] = low;
}
void scc(auto& g, auto f) {
	int n = size(g);
	val.assign(n, 0); comp.assign(n, -1);
	Time = ncomps = 0;
	for (int i = 0; i < n; ++i) if (comp[i] < 0) dfs(i, g, f);
}

int main()
{
    ios::sync_with_stdio(false); cin.tie(0);

    int n, m; cin >> n >> m;
    vector adj(n, vector<int>());
    for (int i = 0; i < m; ++i) {
        int u, v; cin >> u >> v;
        adj[--u].push_back(--v);
    }
    scc(adj, [&] (auto &cont) {
        sizes.push_back(cont.size());
    });

    sort(rbegin(sizes), rend(sizes));
    vector<ll> bad(ncomps);
    int tot = 0;
    for (int i = 0; i < ncomps; ++i) {
        bad[i] = 1LL * tot * sizes[i];
        if (i) bad[i] += bad[i-1];
        tot += sizes[i];
        sizes[i] = tot;
    }

    int q; cin >> q;
    while (q--) {
        ll k; cin >> k;
        auto it = upper_bound(begin(bad), end(bad), k);
        int ans = 0, pos = it - begin(bad);
        ans = sizes[pos - 1];
        k -= *prev(it);
        
        if (it != end(bad)) {
            ans += k / sizes[pos - 1];
        }
        cout << ans << '\n';
    }
}