PRC - Editorial

PROBLEM LINK:

Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4

Author: pols_agyi_pols
Tester: watoac2001
Editorialist: iceknight1093

DIFFICULTY:

TBD

PREREQUISITES:

Knapsack, elementary knowledge of permutation cycles

PROBLEM:

You’re given a permutation P of 1 through N, and an integer K.
For each x from 1 to N, find the minimum number of swaps needed to reach a permutation where the cycle containing x has size exactly K.

EXPLANATION:

Since we’re dealing with permutation cycles, first we need to understand how a swap changes them.

Consider two elements x and y. Suppose we swap P_x and P_y. Then,

  • If x and y don’t lie in the same cycle initially, their cycles will merge into a single larger cycle whose length is the sum of the lengths of the component cycles.
  • If x and y do lie in the same cycle initially, that cycle will break into two smaller cycles.
    Specifically, if the cycle has length L, and the distance (number of edges) between x and y in the cycle is d, the two new cycles will have lengths d and L-d.

In particular, observe that if x lies on a cycle of some length L, we can always make x lie on a cycle of any length \lt L in a single move.

Now, let’s move to solving the problem at hand.
Suppose there are initially m cycles in P, with lengths C_1, C_2, \ldots, C_m.

Let’s fix an x, and try to solve for x.
Suppose x lies on a cycle of length L_x.
First, a couple of simple cases:

  • If L_x = K, the answer is 0.
  • If L_x \gt K, the answer is 1.

This leaves us with L_x \lt K.
The natural greedy approach now would be to merge with large cycles over and over again till we exceed K, at which point we can break the cycle down to exactly K.

Specifically, let’s find the smallest y such that combining the cycle of x with the largest other y cycles results in a cycle length that’s \geq K.
Then, we can definitely use y+1 moves to get to a length-K cycle: y to combine with these cycles, and then one extra move to break it down to length K.
However, we might be able to do better in some cases - particularly, if we’re able to find exactly y other cycles such that combining them with the the cycle of x results in a length of exactly x, we can save one move and use only y swaps.
Note that using \lt y swaps is impossible, since we are unable to even reach a sum of K using \lt y other cycles. So, if we’re able to use y moves it’s optimal, otherwise y+1 is optimal.

So, our task is now to check if some y-sized subset of the C_i (excluding one copy of L_x) sums to K - L_x.
This seems like a knapsack variant (it’s literally a subset sum, actually), but it’s not immediately obvious how to do it fast, since we also need to repeat this for different x (and hence different L_x).

One observation that can be made here is that, since we’re choosing lengths greedily, if some subset of size y does sum up to K - L_x it must also be the smallest such subset.
So, rather than checking for whether there’s a subset of size exactly y with the necessary sum, all we need to know if the smallest size of a subset with the necessary sum, which is easier to deal with.

Next, we make a rather classical observation: since \sum C_i = N, there are only \mathcal{O}(\sqrt N) distinct values of C_i.
That is, we only need to care about \mathcal{O}(\sqrt N) distinct cycle lengths - so rather than solve the problem for each x, we can solve the problem for each distinct cycle length, which is a much smaller number. Already an improvement!

Now, we work on a more global scale.
Let the distinct cycle lengths be D_1, D_2, \ldots, D_s, in ascending order.
Let F_i denote the number of occurrences of cycle length D_i.

Note that when we’re processing a cycle length D_i, we can freely use all lengths other than D_i - only for D_i alone are we forced to use at most F_i - 1 copies.

To account for this, let’s define two separate knapsacks, which can then be combined as necessary.

  • \text{dp}_1[i][x] is the minimum size of a subset of the first i lengths that sums to x, assuming we can use at most one copy of each length.
    This is easily computed in \mathcal{O}(N\sqrt N) time.
  • \text{dp}_2[x] is the minimum size of a subset that sums to x, assuming we can use at most (F_i - 1) copies of each D_i (that is, all but one copy).
    Since \sum F_i \leq N, this can also be computed in \mathcal{O}(N\sqrt N) time: see speedup 1 here, for instance.

\text{dp}_2 will be further updated later.

Now, let’s iterate over i in descending order, from s to 1.
When at index i, we’ll ensure that \text{dp}_2 allows for using all the elements \gt D_i, as many times as they occur.
When processing D_i:

  • We want to find the minimum size of a subset whose sum is exactly K - D_i.
  • Suppose we’ve ensured that \text{dp}_2 allows for lengths \gt D_i to be used as many times as they occur.
    Note that lengths \leq D_i are still missing one occurrence.
  • However, \text{dp}_1[i-1] allows for using each element \lt D_i exactly once.
    So, we can combine \text{dp}_2 and \text{dp}_1[i-1] to allow for taking everything but one copy of D_i itself!
    In particular, we don’t care about the entire combined array: we only care about the value of this combined array at K - D_i. That can be easily computed in linear time.
  • Once that’s done, simply add one copy of D_i into \text{dp}_2 in linear time.

This way, we’ve found the answers for all D_i in \mathcal{O}(N\sqrt N) time, which is enough to solve the problem!

TIME COMPLEXITY:

\mathcal{O}(N\sqrt N) per testcase.

CODE:

Tester's code (C++)
#include<bits/stdc++.h>

using namespace std;

#define all(x) begin(x), end(x)
#define sz(x) static_cast<int>((x).size())
//#define int long long
typedef long long ll;

const int maxn = 1e5 + 1;
const int inf = 1e9;
int m1[maxn];
int m2[450][maxn];
int freq[maxn];


signed main() {

        ios::sync_with_stdio(0);
        cin.tie(0);


        int t;
        cin >> t;

        while (t--) {


                int n, k;
                cin >> n >> k;
                int a[n];
                for (auto &x : a) {
                        cin >> x;
                        x--;
                }
                for (int i = 0; i <= n; i++) m1[i] = inf, freq[i] = 0;
                for (int i = 0; i < 450; i++) {
                        for (int j = 0; j <= n; j++) m2[i][j] = inf;
                }
                m1[0] = 0; m2[0][0] = 0;

                bool vis[n] = {0};
                int clen[n];
                vector<int> v;
                for (int i = 0; i < n; i++) {
                        if (vis[i]) continue;
                        int len = 1;
                        vis[i] = 1;
                        int cur = a[i];
                        vector<int> f = {i};
                        while (!vis[cur]) {
                                vis[cur] = 1;
                                len++;
                                f.push_back(cur);
                                cur = a[cur];
                        }
                        for (auto u : f) clen[u] = len;
                        freq[len]++;
                        if (freq[len] == 1) v.push_back(len);
                }

                sort(all(v));

                for (auto u : v) {
                        int f = freq[u] - 1;
                        if (f == 0) continue;
                        for (int j = 0; j < u; j++) {
                                int mn = inf;
                                vector<array<int, 2>> v;
                                int i = 0;
                                int r = 0;
                                for (int k = j; k <= n; k += u, r++) {
                                        while (i < sz(v)) {
                                                auto [x, y] = v[i];
                                                if (y >= r - f) break;
                                                i++;
                                        }
                                        int mn = m1[k];
                                        if (i < sz(v)) {
                                                m1[k] = min(m1[k], v[i][0] + r - v[i][1]);
                                        }
                                        while (sz(v)) {
                                                auto [x, y] = v.back();
                                                if (x - y >= mn - r) v.pop_back();
                                                else break;
                                        }
                                        v.push_back({mn, r});
                                        i = min(i, sz(v) - 1);
                                }
                        }
                }

                for (int i = 0; i < sz(v); i++) {
                        int u = v[i];
                        if (i == 0) {
                                m2[i][u] = 1;
                                continue;
                        }
                        for (int j = 0; j <= n - u; j++) {
                                m2[i][j + u] = min(m2[i][j + u], m2[i - 1][j] + 1);
                                m2[i][j] = min(m2[i][j], m2[i - 1][j]);
                        }
                }



                int mn[n + 1];
                fill(mn, mn + n + 1, inf);
                for (int i = sz(v) - 1; i >= 0; i--) {
                        int u = v[i];
                        int req = k - u;
                        if (req < 0) continue;
                        if (i == 0) {
                                mn[u] = m1[req];
                                continue;
                        }
                        for (int j = 0; j <= req; j++) {
                                mn[u] = min(mn[u], m1[req - j] + m2[i - 1][j]);
                        }
                        for (int j = n - u; j >= 0; j--) {
                                m1[j + u] = min(m1[j + u], m1[j] + 1);
                        }
                }

                for (auto u : v) {
                        int req = k - u;
                        int ans = 1;
                        for (int i = sz(v) - 1; i >= 0; i--) {
                                int r = freq[v[i]];
                                if (v[i] == u) r--;
                                for (int j = 1; j <= r; j++) {
                                        if (req <= 0) break;
                                        ans++;
                                        req -= v[i];
                                }
                        }
                        mn[u] = min(mn[u], ans);
                }


                vector<int> ans;

                for (int i = 0; i < n; i++) ans.push_back(mn[clen[i]]);
                for (auto u : ans) cout << u << " "; cout << "\n";
                

        }
        
        
}

However, dp1[i−1]\text{dp}_1[i-1]dp1​[i−1] allows for using each element <Di\lt D_i<Di​ exactly once.

This part I don’t understand, why just taking exactly once. Won’t you sometimes want to take these cycle lengths more than once?

It seems to contradict this idea

Note that when we’re processing a cycle length DiD_iDi​, we can freely use all lengths other than DiD_iDi​ - only for DiD_iDi​ alone are we forced to use at most Fi−1F_i - 1Fi​−1 copies.

Nevermind, it makes sense now. Cause the dp2 is accounting for all elements > Di and is missing one occurrence for elements <= Di. So you want to add one element back in for < Di, that way only missing one occurrence just for Di. It says it in editorial.