CIELHACK - Editorial

anton_lunyov · October 16, 2012, 3:39pm

PROBLEM LINK:

Practice
Contest

Author: Hiroto Sekido
Tester: Anton Lunyov
Editorialist: Anton Lunyov

DIFFICULTY:

HARD

PREREQUISITES:

Probability theory, Harmonic number, Euler-Maclaurin formula, Binary Search, Dynamic Programming Over Subsets

PROBLEM:

It is the best described in the problem statement.

QUICK EXPLANATION:

Denote by Time[i][j] the minimal expected time needed to crack i-th piece at the j-th center. After we compute Time[i][j] for all possible i and j we can solve the remaining problem of assigning centers to pieces by quite standard DP over subsets of pieces in O(2^K * K * C) time. The only unusual thing that needs careful consideration is that centers can have both positive and negative coordinates.

Clearly, Time[i][j] depends only on number of possible passwords N_i, number of computers P_j, and parameters S_j and T_j of computer center. So we can write Time[i][j] = F(N_i, P_j, S_j, T_j). So our main problem is to calculate F(N, P, S, T) fast enough.

Denote by A(k) the minimal expected time for connecting k computers for S = 1 (recall that Ciel can use different scenarios for this). Further, denote by B(N, k) the expected time for cracking the piece by k computers for T = 1. Then

**F(N, P, S, T) = min{S * A(k) + T * B(N, k) : 1 ≤ k ≤ P}**.

It can be shown that

**A(k) = 1 + 1/2 + ... + 1 / (k − 1)**.

Moreover, all Ciel scenarios lead to the same expected time. Further, it can be shown that

**B(N, k) = (1 / N)^k + (2 / N)^k + ... + (N / N)^k**.

The values A(k) can be computed directly for small k < 10⁵ (or 10⁶) and saved in the array before processing test data, while for the large k the following approximate formula can be used (see here):

**1 + 1 / 2 + ... + 1 / n = log(n) + C + 1 / 2 / n**,

where C is Euler-Mascheroni constant. This formula has an absolute error no more than 1 / 12 / n². So it is more than enough to calculate A(k) for k > 10⁵ with required precision.

Calculation of B(N, k) is much trickier.

If t = (k + 1) / N < 2 we can use Euler-Maclaurin formula for f(x) = x^k with m = 0 and n = N:

**B(N, k) = N / (k + 1) + 1 / 2 + sum(B(2 * j) / (2 * j)! * k * (k − 1) * ... * (k − 2 * j + 2) / N^{2 * j − 1} : 1 ≤ j ≤ p)**,

where B(h) is the h-th Bernoulli number (see here). This formula has relative error no more than 3 * (t / 2 / Pi)^{2 * p} for p ≥ 2 (Pi = 3.1415926…). So for t ≤ 2 we can safely use it with p = 10. However for very small k this formula is wrong. To use it safely take p = min(10, k / 2). The thing is that the summand with 2 * j − 1 = k should be zero since the derivative f^(k)(0) is non-zero and coincide with f^(k)(N), while for all other j it is fine.

If t ≥ 2 then most of the summands in B(N, k) are small. Namely,

**B(N, k) = ((N − p) / N)^k + ... + (N / N)^k**

with relative error no more than e^{−p * t} / t. So for t > 2 we can use it with p = 10.

It can be proved that the sequence a(k) = S * A(k) + T * B(N, k) at first decreases and then increases. Hence to find the optimal number of computers we can use ternary search. But you should be careful. The ordinal way has precision issues even at the 4-th example in the problem statement. Consider the difference a(k + 1) − a(k). It is equal to S / k − T * (B(N, k) − B(N, k + 1)). To compare it with zero the difference B(N, k) − B(N, k + 1) should be calculated carefully. Namely if (k + 1) / N ≤ 2 one should calculate this difference by pencil and paper from the corresponding formulas for B(N, k) and B(N, k +1). If (k+1) / N > 2 then the same should be applied but the formula for this difference is simpler:

**B(N, k) − B(N, k + 1) = sum((j / N) * ((N − j) / N)^k : 1 ≤ j ≤ p)**.

So, in fact, the ternary search transforms to the binary search for the sequence k * (B(N, k) − B(N, k +1)).

EXPLANATION:

1. Why the optimal connection time is S * (1 + 1 / 2 + … + 1 / (k − 1))?

At first consider the case when preference list for each computer is (1, 2, …, k). In this case it is clear, that at each time all computers that have already the program and the piece, start simultaneously the connection trial to the next computer. (At first 1 try to connect to 2. When connection is established both 1 and 2 try to connect to 3 simultaneously. When some of them is connected, all three computers 1, 2 and 3 try to connect to 4 simultaneously, and so on).

At the i-th step first i computers start simultaneously the connection trials to the (i + 1)-th computer. The time when at least one of them will be connected is the random variable equals to Y = min{Y₁, …, Y_i}, where Y_h is a time needed by the h-th computer to connect to the (i + 1)-th computer and according to the problem statement it is a random variable with the exponential distribution with mean S = S_j. According to this Y also has exponential distribution but with mean S / i. So it means that the expected time to connect to the (i + 1)-th computer is S / i. Adding this for all i from 1 to k − 1 we get the above value.

Now let’s understand why this number is the same for all possible preferences lists scenarios. Assume that at some time we have i computers with the program and the piece where i < k. Clearly each of them tries to connect to some other computer that is empty. In general situation they could start their connection trials at different time. But due to the memorylessness of the exponential distribution we can assume that all they start their trials simultaneously (read carefully mentioned section at Wikipedia to understand this properly). So as in the previous example the time when the next connection will be established is the random variable equals to the minimum of i equally distributed independent variables that have exponential distribution. Hence the expected time that some computer will receive the program and the piece is S / i. Again adding this for all i we get the above value.

AUTHOR’S AND TESTER’S SOLUTIONS:

Author’s solution can be found here.
Tester’s solution can be found here.

RELATED PROBLEMS:

Collecting Bonuses – Topcoder SRM 400, Div I, Hard

bcurcio · October 16, 2012, 4:36pm

I have a question regarding A(k). Wasn’t it allowed to connect computers 1->2 in time 0, 1->3 and 2->4 in time 1, 1->5 2->6 3->7 and 4->8 in time 2, etc… ? This corresponds to A(k)?

acube · October 17, 2012, 4:43am

It is not optimal. In your case, assuming that we have N computers with the program and we have at least N more without it, the expected time to connect N more would be S.
One thing to note is that when k computers try to connect to a single computer the expected time is S/k. So whenever we have N computers and we want to add N more, if we make all computers try to connect to single computer one by one, the expected time then is: S/N + S/(N+1) + S/(N+2) + … + S/(N+N-1) which is less than S.

bcurcio · October 17, 2012, 5:08am

Ohhh, I see. I mixed up connection time with copying time and assumed two computers wouldn’t try to connect to the same computer. Now it makes more sense, thanks

pragrame · October 18, 2012, 4:44pm

@acube, IMO there is no difference HOW you order the networking connections, AS LONG AS You dont have any IDLE computer till the time you get all the computers (that you need) up.

In the case you have considered, “the expected time to connect the N more would be S”, you are assuming that once a computer has connected its required computer, it then goes idle. But this is not the case.

I look at it this way: Suppose at some time you have ‘k’ computers running. Now, IRRESPECTIVE of who is trying to connect whom, the expected time to get the (k+1)-th computer running Would be S/k. (limited text)

pragrame · October 18, 2012, 4:49pm

I reached upto the B(n, k) part, and then got stuck. With the precision requirements.