XIN2 - Editorial

PROBLEM LINK:

Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4

Author: iceknight1093
Tester: nskybytskyi
Editorialist: iceknight1093

DIFFICULTY:

Easy-Med

PREREQUISITES:

None

PROBLEM:

Find a permutation P of the integers 1 to N such that P_i \neq i and \max_{i=1}^N \left( P_i \oplus i\right) is minimized.
For full score, also find the lexicographically maximum permutation.

EXPLANATION:

It is recommended that you read and understand the solution to XIN first - namely, finding any valid permutation.

Recall that we had three cases:

  1. N is either a power of 2.
  2. The answer is 3, which occurs when either N is odd or 4k+2.
  3. Every other case - meaning N is divisible by some power of 2 that’s at least 2.

Let’s tackle each of these in turn.


When N is a power of 2, we were forced to have P_1 = N and P_N = 1.
However, the elements inbetween could be arranged in any manner.
So, the best thing is to just arrange them in descending order.
Since N is even, we’re ensured that P_i \neq i for any i, and so this is clearly optimal.


When the answer is 3, observe that we have to ensure P_i and i have the same bitmask of set bits, considering bits \geq 2.
In particular, this means we can split the range [1, N] into blocks of size 4 as [4k, 4k+1, 4k+2, 4k+3] - and we’re only allowed to match an element to something else within its own block.
This means we can independently solve for each block and concatenate the results to obtain the final answer.
Note that the first block will be [1, 2, 3] having size 3, and the last block may have size 2, 3, or 4 (but never 1).

Within a block of size 4, elements can be arranged in descending order.
For size 3 the optimal construction is [3, 1, 2], placing the largest element first.
For size 2 we can only swap the elements.


Finally, we’re left with the case when the answer is 2^k for some k \geq 2.

Once again, looking at bits \gt k, we see that P_i and i are forced to have the same subset of them set.
This breaks [1, N] into several blocks of length 2^{k+1}, with two exceptions:

  • The first block starting at 1 and ending at 2^{k+1} - 1.
  • The last block starting at N - 2^k and ending at N.

Once again, since elements inside a block must stay within it, we can maximize each block individually.

  • The last block’s elements can be arranged in descending order.
  • For full 2^{k+1} blocks, it can be observed that the best we can do is set P_i = i\oplus 2^k.

That leaves only the first block.
Here, it’s again optimal to set P_i = i \oplus 2^k as long as possible.
This fails only once you reach i = 2^k - 1.
From here, the elements \{2^k-2, 2^k-2, 2^k, 2^{k+1}-1\} need to be arranged appropriately at positions \{2^k-1, 2^k, 2^{k+1}-1, 2^{k+1}-2\}.
Figuring this out is left as an exercise to the reader (if you don’t like casework, just bruteforce all permutations of 4 elements.)

TIME COMPLEXITY:

\mathcal{O}(N) per testcase.

CODE:

Editorialist's code (C++)
// #include <bits/allocator.h>
// #pragma GCC optimize("O3,unroll-loops")
// #pragma GCC target("avx2,bmi,bmi2,lzcnt,popcnt")
#include "bits/stdc++.h"
using namespace std;
using ll = long long int;
mt19937_64 RNG(chrono::high_resolution_clock::now().time_since_epoch().count());

int main()
{
    ios::sync_with_stdio(false); cin.tie(0);

    int t; cin >> t;
    while (t--) {
        int n; cin >> n;

        if ((n & (n-1)) == 0) {
            // Power of 2 -> descending order
            for (int i = n; i >= 1; --i) cout << i << ' ';
            cout << '\n';
        }
        else if (n%4) {
            // ans = 3

            auto solve = [&] (int L, int R) {
                if (L+3 == R) cout << R << ' ' << R-1 << ' ' << R-2 << ' ' << R-3 << ' ';
                else if (L+2 == R) cout << R << ' ' << R-2 << ' ' << R-1 << ' ';
                else cout << R << ' ' << L << ' ';
            };

            solve(1, 3);
            for (int i = 4; i <= n; i += 4)
                solve(i, min(n, i+3));
            cout << '\n';
        }
        else {
            int k = __builtin_ctz(n);
            int pw = 1 << k;
            vector<int> ans(n+1);
            
            // First block
            for (int i = 1; i <= pw - 2; ++i) {
                ans[i] = i + pw;
                ans[i + pw] = i;
            }
            ans[pw-1] = pw-2;
            ans[2*pw-1] = pw-1;
            ans[pw] = 2*pw-1;
            ans[2*pw-2] = pw;
            
            // Full blocks: p[i] = i xor pw
            int i = 2*pw;
            for (; i + 2*pw <= n; i += 2*pw) {
                for (int j = 0; j < pw; ++j) {
                    ans[i+j] = i+j+pw;
                    ans[i+j+pw] = i+j;
                }
            }

            // Last block: descending
            for (int j = i; j <= n; ++j)
                ans[j] = n+i-j;
            swap(ans[n-pw/2], ans[n-pw/2+1]);
            for (i = 1; i <= n; ++i) cout << ans[i] << ' ';
            cout << '\n';

            {
                for (i = 1; i <= n; ++i) {
                    assert(ans[i] != i);
                    assert((ans[i] ^ i) <= pw);
                }
            }
        }
    }
}
Tester's code (C++)
#include <bits/stdc++.h>
using namespace std;

vector<int> solve_mod_four(int n) {
    vector<int> p = {3, 1, 2};

    // Another special case oh well
    if (n == 3) {
        return p;
    }

    // Blocks of 4 in reverse:
    for (int k = 1; 4 * k < ((n & 1) ? (n - 3) : (n - 2)); ++k) {
        // (4k+3) xor (4k) = 3
        // (4k+2) xor (4k+1) = 3
        p.push_back(4 * k + 3);
        p.push_back(4 * k + 2);
        p.push_back(4 * k + 1);
        p.push_back(4 * k);
    }

    // I'm running out of time so no fancy code for you this time
    if ((n & 3) == 1) {
        p.push_back(n);
        p.push_back(n - 1);
    }

    if ((n & 3) == 2) {
        p.push_back(n);
        p.push_back(n - 2);
        p.push_back(n - 1);
    }

    if ((n & 3) == 3) {
        p.push_back(n);
        p.push_back(n - 1);
        p.push_back(n - 2);
        p.push_back(n - 3);
    }

    return p;
}

vector<int> solve_lsb(int n) {
    const auto block = 1 << __builtin_ctz(n);
    const auto first = 2 * block, last = n - block;

    vector<int> p(n);
    // Example first block = 8
    // 1  2  3  4  5  6 7  8 9 10 11 12 13 14 15
    // 9 10 11 12 13 14 6 15 1  2  3  4  5  8  7
    // [max   possible] ^ ^^ [one  option] [rest]
    for (int i = 1; i < block - 1; ++i) {
        p[i - 1] = i ^ block;
    }
    p[block - 2] = block - 2;
    p[block - 1] = first - 1;
    for (int i = block + 1; i < first - 2; ++i) {
        p[i - 1] = i ^ block;
    }
    p[first - 3] = block;
    p[first - 2] = block - 1;

    // Example full blocks: (block = 4)
    // 0b1000 <-> 0b1100
    // 0b1001 <-> 0b1101
    // 0b1010 <-> 0b1110
    // 0b1011 <-> 0b1111
    for (int i = first; i < last; ++i) {
        p[i - 1] = i ^ block;
    }

    // Last block is similar to power of two case
    for (int i = last; i <= n; ++i) {
        p[i - 1] = n - (i - last);
    }
    // To ensure p[n - block / 2] != n - block / 2
    swap(p[n - block / 2 - 1], p[n - block / 2]);

    return p;
}

vector<int> solve_power_of_two(int n) {
    vector<int> p(n);
    for (int i = 0; i < n; ++i) {
        // 1 xor (2^k) = 2^k + 1
        // x xor (2^k+1-x) < 2^k for x > 1
        p[i] = n - i;
    }
    return p;
}

vector<int> solve(int n) {
    if (n & 3) {
        return solve_mod_four(n);
    } else if (n & (n - 1)) {
        return solve_lsb(n);
    } else {
        return solve_power_of_two(n);
    }
}

int main() {
    cin.tie(0)->sync_with_stdio(0);
    int t; cin >> t; while (t--) {
        int n;
        cin >> n;
        const auto p = solve(n);
        for (auto pi : p) {
            cout << pi << ' ';
        }
        cout << '\n';
    }
}