# PROBLEM LINK:

Practice

Contest: Division 1

Contest: Division 2

Contest: Division 3

Contest: Division 4

* Authors:* rivalq, iceknight1093, leucifer

*iceknight1093*

**Tester & Editorialist:**# DIFFICULTY:

2970

# PREREQUISITES:

Computing SCCs of a digraph, observation

# PROBLEM:

You have an array A containing N integers.

You’re allowed to swap A_i and A_j if A_i \ \& \ A_j = 0.

Find the lexicographically smallest array you can obtain.

# EXPLANATION:

Let’s start with a slow solution first.

Consider an undirected graph on N vertices, where i and j are joined by an edge iff A_i \ \& \ A_j = 0.

Let’s look at a connected component of this graph, say containing vertices \{v_1, v_2, \ldots, v_k\}. Let v_i \lt v_{i+1}.

Clearly, no matter how we perform swaps, these k vertices can only be permuted within themselves.

So, the best we can hope for is that within this component, elements are sorted.

That is, the smallest A_{v_i} value among them is placed at v_1, the second smallest is placed at v_2, and so on.

In fact, it is possible to achieve this!

## Proof

Suppose A_i and A_j are in the same component.

Then, there’s a path between them, say A_i, x_1, x_2, \ldots, x_k, A_j.

Perform the following sequence of swaps:

- Swap A_i and x_1. The order is now [x_1, A_i, x_2, \ldots, x_k, A_j].
- Swap x_1 and x_2. The order is now [x_2, A_i, x_1, x_3, \ldots, x_k, A_j].
- Swap x_2 and x_3, then x_3 and x_4, and so on till you can swap x_k and A_j.

The order is now [A_j, A_i, x_1, x_2, x_3, \ldots, x_k]. - Now do this in reverse, swapping x_k with x_{k-1} and so on till you swap x_1 and x_2.

The order is now [A_j, A_i, x_2, x_3, \ldots, x_k, x_1]. - Finally, swap A_i with x_1.

Now, A_i and A_j have swapped positions, and everything else is where it started; as we wanted!

Since we can perform arbitrary swaps within a component, any permutation is achievable.

Now, the above solution is obviously too slow since there can be \mathcal{O}(N^2) edges.

However, notice that we don’t really care about the graph we had: we only care about its *connected components*.

In particular, if we could find an equivalent graph that had far fewer edges but preserved connectivity relations, we’d be done.

Here’s one way to accomplish that.

Let \neg \ x denote the integer (2^{20}-1)\oplus x, i.e, the 20-bit number that has a bit set iff x doesn’t.

In particular, since we’re dealing with 20-bit integers, x \ \& \ y = 0 iff x is a submask of \neg \ y (and vice versa).

Consider a *directed* graph on N + 2^{20} vertices, where vertices 0, 1, 2, \ldots, 2^{20}-1 represent *values* and vertices 2^{20}, 2^{20}+1, \ldots, 2^{20}+N-1 represent the N *indices*.

Create the following edges:

- For each index i, the edge i \to \neg \ A_i
- For each index i, the edge A_i \to i
- For each value x, the edge x \to y, where y is a bitmask obtained by removing exactly one bit from x.

For example, if x = 7, the edges created here would be 7 \to 6, 7 \to 5, 7 \to 3.

This ensures that there’s a directed path from x to all of its submasks.

Interestingly, this graph achieves what we want: A_i and A_j lie in the same component of the original graph iff indices i and j lie in the same **strongly connected component** of our new directed graph.

## Proof

Suppose A_i \ \& \ A_j = 0 (which means A_j is a submask of \neg \ A_i).

Then, the graph we created has the path i \to \neg \ A_i \to A_j \to j.

Similarly, the path j \to \neg \ A_j \to A_i \to i will also exist; which means that i and j lie in the same SCC.

##
Now, if A_i and A_j were in the same component of the original graph, that means there’s a path connecting them consisting of swappable elements.

Using the above argument, It’s easy to extend that to a path from i to j (and vice versa) in our digraph, which means once again i and j lie in the same SCC.

Conversely, suppose a path i \to j exists.

Let P be a shortest path from i to j.

There are two possibilities: either P doesn’t contain any other index, or it does.

Suppose P doesn’t contain any other index.

Then, note that by construction, the only outgoing edge from i is i \to \neg \ A_i, and the only incoming edge to j is A_j \to j.

This means P is a path from \neg \ A_i to A_j, that doesn’t touch any other index.

However, the way we constructed the graph on values, each vertex can only reach its submasks.

So, A_j must be a submask of \neg \ A_i, which means A_i \ \& \ A_j = 0.

Next, suppose P does contain other indices.

Let the indices we visit be, in order, i \to x_1 \to x_2 \to \ldots \to x_k \to j

The above argument tells us that each adjacent pair of elements here has bitwise AND 0.

So, i and j were in the same component of the original graph as well, and we’re done!

Our digraph has 2N + 2^{20}\cdot 10 edges, which is small enough that a linear-time SCC algorithm should run in time.

If your implementation uses too much memory (especially, many small lists) it can affect runtime as well.

One optimization which allows for a decently large speedup is to just not store most edges.

The edges of the form x \to submask(x) form the bulk of the edges, but they have a rather nice structure, so instead of storing them, they can simply be created and used when processing the appropriate vertex.

# TIME COMPLEXITY

\mathcal{O}(N\log N + 2^{20}\cdot 10) per testcase.

# CODE:

## Editorialist's code (C++)

```
// #pragma GCC optimize("O3,unroll-loops")
// #pragma GCC target("avx2,bmi,bmi2,lzcnt,popcnt")
#include "bits/stdc++.h"
using namespace std;
using ll = long long int;
mt19937_64 rng(chrono::high_resolution_clock::now().time_since_epoch().count());
int main()
{
ios::sync_with_stdio(false); cin.tie(0);
const int bits = 20;
const int full = (1 << bits) - 1;
int n; cin >> n;
vector<int> a(n);
for (int &x : a) cin >> x;
vector<int> indices(n), start(1 << bits, -1);
for (int i = 0; i < n; ++i) indices[i] = i;
sort(begin(indices), end(indices), [&] (int i, int j) {return a[i] < a[j];});
for (int i = 0; i < n; ++i) {
if (i > 0 and a[indices[i]] == a[indices[i-1]]) continue;
start[a[indices[i]]] = i;
}
basic_string<int> val, comp, z, cont;
int Time, ncomps;
auto dfs = [&] (const auto &self, int u, auto &f) -> int {
int low = val[u] = ++Time, x; z.push_back(u);
if (u <= full) {
for (int bit = 0; bit < bits; ++bit) if (u & (1 << bit)) {
int e = u ^ (1 << bit);
if (comp[e] < 0) low = min(low, val[e] ?: self(self,e,f));
}
if (start[u] != -1) {
int v = start[u], value = a[indices[v]];
while (v < n) {
int e = indices[v];
if (a[e] != value) break;
e += 1 << bits;
if (comp[e] < 0) low = min(low, val[e] ?: self(self,e,f));
++v;
}
}
}
else {
int e = a[u - full - 1] ^ full;
if (comp[e] < 0) low = min(low, val[e] ?: self(self,e,f));
}
if (low == val[u]) {
do {
x = z.back(); z.pop_back();
comp[x] = ncomps;
cont.push_back(x);
} while (x != u);
f(cont); cont.clear();
ncomps++;
}
return val[u] = low;
};
auto SCC = [&] (auto f) {
int N = n + (1 << bits);
val.assign(N, 0); comp.assign(N, -1);
Time = ncomps = 0;
for (int i = 0; i < N; ++i) if (comp[i] < 0) dfs(dfs, i, f);
};
vector<int> ans(n);
SCC([&] (const auto &v) {
vector<int> indices, values;
for (int x : v) if (x > full) {
indices.push_back(x-1 - full);
values.push_back(a[x-1-full]);
}
if (indices.empty()) return;
sort(begin(indices), end(indices));
sort(begin(values), end(values));
int sz = indices.size();
for (int i = 0; i < sz; ++i) ans[indices[i]] = values[i];
});
for (int x : ans) cout << x << ' ';
cout << '\n';
}
```