SORTP9 - Editorial


Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4

Authors: rivalq, iceknight1093, leucifer
Tester & Editorialist: iceknight1093




Computing SCCs of a digraph, observation


You have an array A containing N integers.
You’re allowed to swap A_i and A_j if A_i \ \& \ A_j = 0.

Find the lexicographically smallest array you can obtain.


Let’s start with a slow solution first.

Consider an undirected graph on N vertices, where i and j are joined by an edge iff A_i \ \& \ A_j = 0.
Let’s look at a connected component of this graph, say containing vertices \{v_1, v_2, \ldots, v_k\}. Let v_i \lt v_{i+1}.

Clearly, no matter how we perform swaps, these k vertices can only be permuted within themselves.
So, the best we can hope for is that within this component, elements are sorted.
That is, the smallest A_{v_i} value among them is placed at v_1, the second smallest is placed at v_2, and so on.

In fact, it is possible to achieve this!


Suppose A_i and A_j are in the same component.

Then, there’s a path between them, say A_i, x_1, x_2, \ldots, x_k, A_j.

Perform the following sequence of swaps:

  • Swap A_i and x_1. The order is now [x_1, A_i, x_2, \ldots, x_k, A_j].
  • Swap x_1 and x_2. The order is now [x_2, A_i, x_1, x_3, \ldots, x_k, A_j].
  • Swap x_2 and x_3, then x_3 and x_4, and so on till you can swap x_k and A_j.
    The order is now [A_j, A_i, x_1, x_2, x_3, \ldots, x_k].
  • Now do this in reverse, swapping x_k with x_{k-1} and so on till you swap x_1 and x_2.
    The order is now [A_j, A_i, x_2, x_3, \ldots, x_k, x_1].
  • Finally, swap A_i with x_1.
    Now, A_i and A_j have swapped positions, and everything else is where it started; as we wanted!

Since we can perform arbitrary swaps within a component, any permutation is achievable.

Now, the above solution is obviously too slow since there can be \mathcal{O}(N^2) edges.

However, notice that we don’t really care about the graph we had: we only care about its connected components.
In particular, if we could find an equivalent graph that had far fewer edges but preserved connectivity relations, we’d be done.

Here’s one way to accomplish that.

Let \neg \ x denote the integer (2^{20}-1)\oplus x, i.e, the 20-bit number that has a bit set iff x doesn’t.
In particular, since we’re dealing with 20-bit integers, x \ \& \ y = 0 iff x is a submask of \neg \ y (and vice versa).

Consider a directed graph on N + 2^{20} vertices, where vertices 0, 1, 2, \ldots, 2^{20}-1 represent values and vertices 2^{20}, 2^{20}+1, \ldots, 2^{20}+N-1 represent the N indices.

Create the following edges:

  • For each index i, the edge i \to \neg \ A_i
  • For each index i, the edge A_i \to i
  • For each value x, the edge x \to y, where y is a bitmask obtained by removing exactly one bit from x.
    For example, if x = 7, the edges created here would be 7 \to 6, 7 \to 5, 7 \to 3.
    This ensures that there’s a directed path from x to all of its submasks.

Interestingly, this graph achieves what we want: A_i and A_j lie in the same component of the original graph iff indices i and j lie in the same strongly connected component of our new directed graph.


Suppose A_i \ \& \ A_j = 0 (which means A_j is a submask of \neg \ A_i).
Then, the graph we created has the path i \to \neg \ A_i \to A_j \to j.
Similarly, the path j \to \neg \ A_j \to A_i \to i will also exist; which means that i and j lie in the same SCC.

Now, if A_i and A_j were in the same component of the original graph, that means there’s a path connecting them consisting of swappable elements.
Using the above argument, It’s easy to extend that to a path from i to j (and vice versa) in our digraph, which means once again i and j lie in the same SCC.

Conversely, suppose a path i \to j exists.
Let P be a shortest path from i to j.
There are two possibilities: either P doesn’t contain any other index, or it does.

Suppose P doesn’t contain any other index.
Then, note that by construction, the only outgoing edge from i is i \to \neg \ A_i, and the only incoming edge to j is A_j \to j.
This means P is a path from \neg \ A_i to A_j, that doesn’t touch any other index.
However, the way we constructed the graph on values, each vertex can only reach its submasks.
So, A_j must be a submask of \neg \ A_i, which means A_i \ \& \ A_j = 0.

Next, suppose P does contain other indices.
Let the indices we visit be, in order, i \to x_1 \to x_2 \to \ldots \to x_k \to j
The above argument tells us that each adjacent pair of elements here has bitwise AND 0.

So, i and j were in the same component of the original graph as well, and we’re done!

Our digraph has 2N + 2^{20}\cdot 10 edges, which is small enough that a linear-time SCC algorithm should run in time.

If your implementation uses too much memory (especially, many small lists) it can affect runtime as well.
One optimization which allows for a decently large speedup is to just not store most edges.
The edges of the form x \to submask(x) form the bulk of the edges, but they have a rather nice structure, so instead of storing them, they can simply be created and used when processing the appropriate vertex.


\mathcal{O}(N\log N + 2^{20}\cdot 10) per testcase.


Editorialist's code (C++)
// #pragma GCC optimize("O3,unroll-loops")
// #pragma GCC target("avx2,bmi,bmi2,lzcnt,popcnt")
#include "bits/stdc++.h"
using namespace std;
using ll = long long int;
mt19937_64 rng(chrono::high_resolution_clock::now().time_since_epoch().count());

int main()
	ios::sync_with_stdio(false); cin.tie(0);

	const int bits = 20;
	const int full = (1 << bits) - 1;
    int n; cin >> n;
    vector<int> a(n);
    for (int &x : a) cin >> x;

	vector<int> indices(n), start(1 << bits, -1);
	for (int i = 0; i < n; ++i) indices[i] = i;
	sort(begin(indices), end(indices), [&] (int i, int j) {return a[i] < a[j];});
	for (int i = 0; i < n; ++i) {
		if (i > 0 and a[indices[i]] == a[indices[i-1]]) continue;
		start[a[indices[i]]] = i;

	basic_string<int> val, comp, z, cont;
	int Time, ncomps;
	auto dfs = [&] (const auto &self, int u, auto &f) -> int {
		int low = val[u] = ++Time, x; z.push_back(u);

		if (u <= full) {
			for (int bit = 0; bit < bits; ++bit) if (u & (1 << bit)) {
				int e = u ^ (1 << bit);
				if (comp[e] < 0) low = min(low, val[e] ?: self(self,e,f));
			if (start[u] != -1) {
				int v = start[u], value = a[indices[v]];
				while (v < n) {
					int e = indices[v];
					if (a[e] != value) break;
					e += 1 << bits;
					if (comp[e] < 0) low = min(low, val[e] ?: self(self,e,f));
		else {
			int e = a[u - full - 1] ^ full;
			if (comp[e] < 0) low = min(low, val[e] ?: self(self,e,f));

		if (low == val[u]) {
			do {
				x = z.back(); z.pop_back();
				comp[x] = ncomps;
			} while (x != u);
			f(cont); cont.clear();
		return val[u] = low;
	auto SCC = [&] (auto f) {
		int N = n + (1 << bits);
		val.assign(N, 0); comp.assign(N, -1);
		Time = ncomps = 0;
		for (int i = 0; i < N; ++i) if (comp[i] < 0) dfs(dfs, i, f);

	vector<int> ans(n);
	SCC([&] (const auto &v) {
		vector<int> indices, values;
		for (int x : v) if (x > full)  {
			indices.push_back(x-1 - full);
		if (indices.empty()) return;

		sort(begin(indices), end(indices));
		sort(begin(values), end(values));
		int sz = indices.size();
		for (int i = 0; i < sz; ++i) ans[indices[i]] = values[i];
	for (int x : ans) cout << x << ' ';
	cout << '\n';