SEQCOST Editorial

PROBLEM LINK:

Contest Division 1
Contest Division 2
Contest Division 3
Contest Division 4

Setter: Jatin Garg, Vladislav Bargatin
Tester: Harris Leung
Editorialist: Pratiyush Mishra, Jakub Safin

DIFFICULTY:

Medium

PREREQUISITES:

Convex functions, binary/ternary search

PROBLEM:

You may create any number of increasing integer sequences starting with 1 and ending with N. The number of sequences containing i may be at most A_i. The cost of a sequence B_1, B_2, \ldots, B_M is \sum_{i=1}^{M-1} (B_{i+1}-B_i)^2. Find the maximum possible value of C \cdot K - U, where K is the number of sequences and U is their total cost.

EXPLANATION:

First off, the number of sequences is clearly at most A_1, so we’re not dealing with any infinities.

The second thing to notice is that if the number of sequences K is fixed (K \le A_1, A_N), then we want to use each i (2 \le i \le N-1) as often as we can, i.e. \min(K,A_i) times, because putting i into a sequence that didn’t contain it only increases the total cost. If i was added between l and r, then (r-l)^2 \gt (i-l)^2+(r-i)^2.

How do we put values into sequences? The formula a^2 = \sum_{i=0}^{a-1} (2i+1) lets us calculate s = \sum_{i=1}^{M-1} (B_{i+1}-B_i)^2 in the following way:

s = 0, t = 1
for i = 2...N:
  s += t
  if i in B: t = 1
  else: t += 2

This process can be performed for all K sequences at once, where t_1, t_2, \ldots, t_K are the temporary parameters for those sequences. When processing the value i, we should set A_i of them to 1 and increase the rest. The optimal way is clearly setting the A_i largest parameters to 1 since that gives a smaller contribution to s later on - in other words, appending i to A_i sequences which ended with the smallest values.

It can be implemented in O(N) time if we use a deque of pairs (j, number of sequences ending with j). We remove pairs from the end until we get at least A_i sequences / subtract from the last pair if it had more, then add (i,A_i) to the front. Since for each i, we add at most two pairs to the deque, we can only remove at most 2N times as well. Updating the sum of costs is straightforward during this process. Let’s denote the optimal cost for K sequences by U_K.

Finding the best number of sequences

Let’s make an explicit formula for U_K. We’ve picked some values V_1, V_2, \ldots, V_L in sorted order, where the first K values are 1 and the last K are N, and each value appears at most M times. By induction for i \ge K+1, when we’re adding V_i as described above, the current M sequences end with V_{i-K}, \ldots, V_{i-1}, and we want to add V_i to the sequence that ends with the smallest number, which is V_{i-K}, making sequences that end with V_{i-K+1}, \ldots, V_i. Therefore, U_K = \sum_{i=K+1}^L (V_i-V_{i-K})^2.

Now we need to notice that this formula for U_K remains the same if each value i appears exactly A_i times in the sequence V, not \min(K,A_i) times. If A_i \ge K, then inserting i once more among \ge K occurrences of i doesn’t change \sum (V_i-V_{i-K})^2, we just add (i-i)^2 = 0.

With a fixed sequence V, we can show that U_K is non-increasing for increasing K. For K \le \min(A_1,A_N), let’s define V_{\le 0} = V_1 = 1 and write the difference

U_K-U_{K-1} = \sum_{i=K}^L (V_i-V_{i-K})^2 - \sum_{i=K}^L (V_i-V_{i-K+1})^2 = \sum_{i=K}^L V_{i-K}^2-V_{i-K+1}^2+2V_i(V_{i-K+1}-V_{i-K}) = V_{K-K}^2-V_{L-K+1}^2 + \sum_{i=K}^L 2V_iD_{i-K} = 1-N^2 + 2 \sum_{i=1}^{L-1} V_{i+K}D_i \,,

where D_i = V_{i+1}-V_i. The sum of V_{i-K}^2-V_{i-K+1}^2 is a so-called telescopic sum where we always subtract a term and add it immediately after, so only the first added and last subtracted term remain. Then we used the fact that the last K elements of V are definitely equal to N, so D_i = 0 for i \ge L-K+1, to extend the last sum and shift indices.

When we increase K, then for each i, the value by which D_i is multiplied never decreases, so U_{K+1}-U_K \ge U_K-U_{K-1}. If we’re increasing K, we always just add the constant C to the value to maximise, and decrease that value by U_K-U_{K-1}. In short, we add something non-increasing, so CK-U_K is a concave function and we can use ternary search - or binary search, looking for the largest K (1 \le K \le \min(A_1,A_N)) such that U_K-U_{K-1} \lt C, since that gives the maximum; note that it’s also possible that U_1 = U_1-U_0 \ge C, in which case the best choice is K=0.

TIME COMPLEXITY:

We’re ternary-searching or binary-searching over K, and finding the optimal cost for given K takes O(N), so the time complexity is O(N \log \min(A_1,A_N)).

SOLUTION:

Setter’s Solution
Tester’s Solution