Author: Roman Rubanenko
Tester: Tuan Anh
Editorialist: Florin Chirica
dynamic programming, basic probability theory
We add elements from 1 to n to a list until all n elements appear in the list. Each element can be added with the same probability. What’s the expected number of elements from the list?
Let’s keep dp[i][j] = what’s probability to have j different elements in the list, such as length of the list is i. We can simulate this solution until the probabilities become small enough. Then, we can either notice the pattern or precalculate the answers in a constant vector.
Small values of n
Approach for small values of n is pretty common in probability tasks. Let’s keep dp[i][j] = the probability to have j different elements in the list, such as length of the list is i.
Definition of expected value is sum of Probability * Value. In this case, probability is calculated by dp for a given length and value is the length itself. Since the list must contain all elements, we get that our answer is sum of dp[len][n] * len.
The trick used here is to consider precision: the precision required is very small. This means, when values dp[len][n] * len becomes almost zero, we can stop the simulation, since it won’t affect the result anyway (well, actually it would affect the result, but for the given precision, it doesn’t matter). In our solution, we consider almost zero to be < 1e-4.
We’re left with recurrences of dp. An element is being chosen with probability 1 / n. We consider dp = 0. Suppose we’ve calculated dp[i][j] and we want to update the results for dp[i + 1][li]. What can be the recurrences?
Well, to find them, let’s analyze the two cases we have.
- We don’t add a new element
This means, we need to add one of existing j elements. The probability to pick one of existing j elements is j / n. So we get
dp[i + 1][j] += dp[i][j] * (j / n)
- We add a new element
We can choose only (n - j) elements. The probability to pick one of them is (n - j) / n.
So we get dp[i + 1][j + 1] += dp[i][j] * ((n - j) / n).
Pay attention when you calculate dp[i][n]. You are not allowed to add result from dp[i - 1][n], since the list of length i - 1 already contains all n elements. This approach should be used for n <= 400 (or something like this). For larger values, it should time out.
Large values of n
There are two ways to solve for larger values. The first one is the most obvious: run your program locally, then store answers in a file, put answers in an array in your program and simply output relevant content of the array.
The not so obvious one is to notice the answer is about n * ln(n) for large values. For those who like proofs, we leave this one as a homework for you
Author’s solution to be updated soon