DEJAVU - Editorial

PROBLEM LINK:

Contest - Division 3
Contest - Division 2
Contest - Division 1

Setter: Jatin Khandual
Tester: Anshu Garg, Prasanna Kumar

DIFFICULTY:

MEDIUM

PREREQUISITES:

Dynamic Programming, Greedy

PROBLEM:

Given a sequence A of distinct integers. You can perform the following moves:

  • Split the sequence (if there are multiple sequences from past operations, choose any one) into 2 sequences. The cost of this operation is the size of the smaller sequence after the split.
  • Concatenate any two sequences. The cost of this operation is the size of the resulting sequence.

Determine the least cost to sort the sequence A in ascending order.

EXPLANATION:

Observation: It is optimal to do merge operations once we have done all necessary split operations.

Let’s work the problem backwards.

Say we have optimally split A into several sequences. Name the sequences S_1, S_2, \dots, S_k such that the sequence [S_1+S_2+\dots+S_k] is equal to sorted A.
All that remains is to repeatedly merge consecutive sequences in some fashion that minimises the total cost of merging. This is a well known DP problem, which can be solved in cubic time.


Let us now determine the best way to split the sequence.

Let \{S_1, S_2, \dots, S_k\} be the least number of sequences we can split A into, such that it is possible to concatenate the sequences and get sorted A.

How do I find the sequences?

Create an sequence of pairs B, whose elements are \{A_i, i\}, for all i. Sort this sequence in ascending order, by the first term.

Divide B into the least number of contiguous blocks such that, in each block, the second term of the elements are consecutive and increasing. (B_i and B_{i+1} are in different blocks, if their second elements are not consecutive). This can be done easily in O(N).

It is easy to deduce that these blocks correspond to contiguous blocks in A, and that a smaller set of blocks that can be merged to form sorted A is impossible.

We show that

  • any other splitting of A will not be cheaper to merge back.
  • the least cost of splitting A into \{S_1, \dots, S_k\} is equal to N-\max(|S_1|, \dots, |S_k|)
Proof of claim 1

Let P = \{S_1,S_2,\dots,S_k\} be the least number of sequences, such that merging them in that order gives us sorted A. Now, any other valid set must be a further splitting of some of the sequences of this set.

Say we split S_i into S1_i and S2_i, for some fixed i. We show that the least cost of merging this new set Q is \ge least cost of merging P.

Consider some optimal merging of Q. The total cost doesn’t depend on the order in which they are merged, only the number of times each sequence is involved in a merge operation; let these values be c_1,\dots,c1_i, c2_i,\dots,c_k.

Example

If you start with \{a,b,c\} and then move on to \{a+b,c\} and finally \{a+b+c\}, the values of c are \{2,2,1\}.

The total cost of merging the sequences is then

c_1*|S_1|+\dots+c1_i*|S1_i|+c2_i*|S2_i|+\dots+c_k*|S_k|

WLOG, let c1_i \le c2_i.
Now, consider any optimal sequence of merging for P (call it V_1), that mimics the optimal sequence of merging for Q (call it V_2), in the following fashion:

  • If an operation in V_2 merges S1_i (with another sequence), the corresponding operation in V_1 merges S_i (with the same sequence V_2 merged S1_i with).
  • If an operation in V_2 merges S2_i (with another sequence), the corresponding operation in V_1 is skipped (that is, no merge takes place in this step).
  • Other operations are equivalent in V_1 and V_2.

It’s not hard to deduce that, applying the operations of V_1 on P merges it completely. The number of times each sequence is involved in a merge is equal to: c_1, \dots,c1_i,\dots, c_k.
The total cost of merging P (when V_1 is applied on it) is then:

c_1*|S_1|+\dots+c1_i*|S_i|+\dots+c_k*|S_k|

It is easy to deduce that

c1_i*|S1_i|+c2_i*|S2_i| \ge c1_i*|S_i|

showing that, applying V_1 on P has total cost no worse than applying V_2 on Q. Thus, We have shown that the least cost of merging Q is \ge least cost of merging P.


Finally, for the case where sequences are split into multiple parts, apply induction and prove the same.

Proof of claim 2

Claim: In any optimal splitting, if we split a sequence into a and b, with |a|\le|b|, we do not split a again in the future.

Proof

We prove by contradiction.

WLOG, let the original sequence have been [a+b]. Say we split a = [c+d] into c and d, in the future. The total of splitting into c,d,b is equal to:

|a|+\min(|c|, |d|)

Rather, if we first split [c+d+b] into c and [d+b], and then into d and b, the total cost would be:

|c|+|d| = |a|

which is less than the cost in the first way.

Contradiction to the optimality of the splitting, and we are done.

Thus, the least cost to split A is equivalent to the sum of lengths of the smaller sequence in each split, which is equal to N-\max |S_i|.

To summarise, we split the array optimally as shown above, and then calculate (using DP) the least cost to merge them to form sorted A.

TIME COMPLEXITY:

Sorting the array takes O(N\log N). Splitting the sequence optimally takes O(N). Merging it using DP takes O(N^3), making the total complexity:

O(N\log N + N + N^3) \approx O(N^3)

SOLUTIONS:

Editorialist’s solution can be found here

BONUS:

How would you solve the problem, if N \le 10^3?

Hint

Knuth’s Optimisation to reduce the O(N^3) DP to O(N^2).


Experimental: For evaluation purposes, please rate the editorial (1 being poor and 5 excellent)

  • 1
  • 2
  • 3
  • 4
  • 5

0 voters

Is this the same question?
https://codedrills.io/contests/icpc-amritapuri-2020-preliminary-round/problems/break-merge-and-sort

not same, but that question inspired us for this one.