THEGAME - Editorial

suh_ash2008 · September 3, 2015, 7:29pm

PROBLEM LINK:

Practice
Contest

Author: Jakub Safin
Tester: Kevin Atienza
Editorialists: Suhash Venkatesh and Pushkar Mishra

DIFFICULTY:

Medium-Hard

PREREQUISITES:

BFS, DFS, Probability, Expected value

PROBLEM:

Given a R\times C maze where each cell is either empty (colored white) or blocked (colored black), you can click on any white cell. Whenever you click on a white cell, it turns red and all cells reachable from it also turn red. The process is repeated until there is a red-colored connected path from start (0, 0) to end (R-1, C-1), when the game ends. You are required to find the expected number of clicks to finish the game.

EXPLANATION:

The first thing to notice is that given a set of connected white cells, clicking on any of them turns all these cells to red. Hence, we first need to group the cells into connected components. This can be achieved by using a DFS - Depth first search (or) BFS - Breadth first search.

It is given that there is at least one connected path of white cells from start (0, 0) to end (R-1, C-1). It is easy to note that after breaking the white cells into groups of connected components, there is exactly one component which contains both start and end cells. One more thing to note is that we are only interested in the number of cells in each component, since clicking on any cell in a component turns all cells in that component to red.

Let us assume that there are (n+1) connected components with sizes x_{0}, x_{1}, ..., x_{n}, where (n >= 0). Let x_{0} be the component which contains both the start and end cells. The game ends when we click on any of these x_{0} cells.

Subtask 1

For this subtask, R\cdot C <= 30. It is easy to observe that the maximum number of connected components is utmost (R\cdot C) / 2 = 15. In fact, it is lower than 15 since we haven’t accounted for the existence of a path from start to end. But 15 is a good enough bound for us to be able to come up with a solution.

We can use a bit mask to denote which components have already been visited. At every step, try to click on any cell of any unclicked component. We can try every possible unclicked component in every step. The game ends when we choose to click on any of the x_{0} cells (component containing the start and end cells).

Pseudo code:

// There are (n+1) components totally => x0, x1, ... xn 
for (i = 0 to 2^(n+1)-1) {
    dp[mask] = -1
}
double E(mask) {
    if (dp[mask] > 0)  return dp[mask];
    sum = 0, ans = 0;
    for (i = 0 to n) {
        if ((1<<i) & mask)  continue;
        sum += x[i];
    }
    for (i = 0 to n) {
        if ((1<<i) & mask)  continue;
        if (i == 0) ans += (x[i] / sum) \* (1);
        else ans += (x[i] / sum) \* (1 + E(mask ^ (1<<i)));
    }
    return dp[mask] = ans;
}

In the above code, calling

E(0)

gives the required answer. Let us take an example with n = 4, and we have to compute E(10010). This means that we have already clicked on the components 1 (x_{1}) and 4 (x_{4}). The number of remaining cells which can be clicked upon in this step is (x_{0} + x_{2} + x_{3}). The probability of clicking on any cell in component i is now x_{i} / (x_{0} + x_{2} + x_{3}), where i = 0, 2 (or) 3. Now, we can take any of these choices and the expected value can be written as:

E(10010) = \left(\frac{x_{0}}{x_{0} + x_{2} + x_{3}}\right) \cdot (1) + \left(\frac{x_{2}}{x_{0} + x_{2} + x_{3}}\right) \cdot (1 + E(10110)) + \left(\frac{x_{3}}{x_{0} + x_{2} + x_{3}}\right) \cdot (1 + E(11010))

This gives us an \mathcal{O}(n \cdot 2^n) solution, where n = \frac{R\cdot C}{2}. This is enough to pass the first subtask.

Subtask 2

For this subtask, R\cdot C <= 50000, The bit mask method used for subtask 1 is way too inefficient for this. We need to do something much smarter. Continuing from where we left in the previous subtask, let’s define E_{n}(x[0..n]) as the expected number of clicks to finish the game when there are (n+1) components of sizes x_{0}, x_{1}, ..., x_{n}. Let us also define E_{n-1}(x[0..n]\setminus x[i]) to be the same as above, but containing only n components, with the component x_{i} removed. We can write the expected value as:

\begin{aligned}E_{n}(x[0..n]) &= \frac{x_{0}}{\sum_{j=0}^{n} x_{j}} + \sum_{i=1}^{n} \frac{x_{i}}{\sum_{j=0}^{n} x_{j}} \cdot (1 + E_{n-1}(x[0..n]\setminus x[i])) \\ & = 1 + \sum_{i=1}^{n} \frac{x_{i}}{\sum_{j=0}^{n} x_{j}} \cdot E_{n-1}(x[0..n]\setminus x[i]) \end{aligned}

This indicates that the expected value should have some closed form. Let’s try to find the expected values for small values of n. It is obvious that E_{0} = 1.

E_{1}(x[0..1]) = 1 + \left(\frac{x_{1}}{x_{0}+x_{1}}\right) \cdot E_{0} = 1 + \left(\frac{x_{1}}{x_{0}+x_{1}}\right)

\begin{aligned} E_{2}(x[0..2]) &= 1 + \left(\frac{x_{1}}{x_{0}+x_{1}+x_{2}}\right) E_{1}(x[0..2]\setminus x[1]) + \left(\frac{x_{2}}{x_{0}+x_{1}+x_{2}}\right) E_{1}(x[0..2]\setminus x[2]) \\ & = 1 + \left(\frac{1}{x_{0}+x_{1}+x_{2}}\right) (x_{1} \cdot E_{1}(x[0..2]\setminus x[1]) + x_{2} \cdot E_{1}(x[0..2]\setminus x[2])) \\ & = 1 + \left(\frac{1}{x_{0}+x_{1}+x_{2}}\right) \left(x_{1} \left(1 + \left(\frac{x_{2}}{x_{0}+x_{2}}\right)\right) + x_{2} \left(1 + \left(\frac{x_{1}}{x_{0}+x_{1}}\right)\right) \right)\\ & = 1 + \left(\frac{x_{1}}{x_{0}+x_{1}}\right) + \left(\frac{x_{2}}{x_{0}+x_{2}}\right) \end{aligned}

You can use a pen and paper to do the above simplification and see that it is straight forward to get to the above derivation. Looking at the pattern, you might be tempted to generalize the following:

E_{n}(x[0..n]) = 1 + \sum_{i=1}^{n} \left(\frac{x_{i}}{x_{0}+x_{i}}\right)

Turns out that this is indeed the solution to the problem. However, we still need to prove this. Let us try to prove this by induction. Let us assume the following:

E_{i}(x[0..i]) = 1 + \sum_{j=1}^{i} \left(\frac{x_{j}}{x_{0}+x_{j}}\right) \\ \text{Hence, } E_{i-1}(x[0..i]\setminus x[k]) = 1 -\left(\frac{x_{k}}{x_{0}+x_{k}}\right) + \sum_{j=1}^{i} \left(\frac{x_{j}}{x_{0}+x_{j}}\right), (1<= k <=i)

We can write E_{i+1} as follows:

\begin{aligned}E_{i+1}(x[0..i+1]) &= 1 + \sum_{j=1}^{i+1} \frac{x_{j}}{\sum_{k=0}^{i+1} x_{k}} \cdot E_{i-1}(x[0..i]\setminus x[j]) \\ & = 1 + \sum_{j=1}^{i+1} \frac{x_{j}}{\sum_{k=0}^{i+1} x_{k}} \cdot \left(1 -\left(\frac{x_{j}}{x_{0}+x_{j}}\right) + \sum_{m=1}^{i+1} \left(\frac{x_{m}}{x_{0}+x_{m}}\right)\right) \\ & = 1 + \frac{1}{\sum_{k=0}^{i+1} x_{k}} \cdot \sum_{j=1}^{i+1} \left(\frac{x_{0} \cdot x_{j}}{x_{0}+x_{j}} + \sum_{m=1}^{i+1} \left(\frac{x_{m} \cdot x_{j}}{x_{0}+x_{m}}\right)\right) \\ & = 1 + \frac{1}{\sum_{k=0}^{i+1} x_{k}} \cdot \sum_{j=1}^{i+1} \left(\frac{x_{0} \cdot x_{j}}{x_{0}+x_{j}}\right) + \frac{1}{\sum_{k=0}^{i+1} x_{k}} \cdot \sum_{j=1}^{i+1} \sum_{m=1}^{i+1} \left(\frac{x_{m} \cdot x_{j}}{x_{0}+x_{m}}\right) \end{aligned}

In the last double summation, note that the order of summation doesn’t matter. Hence, we can write the above equation as follows:

\begin{aligned}E_{i+1}(x[0..i+1]) &= 1 + \frac{1}{\sum_{k=0}^{i+1} x_{k}} \cdot \sum_{j=1}^{i+1} \left(\frac{x_{0} \cdot x_{j}}{x_{0}+x_{j}}\right) + \frac{1}{\sum_{k=0}^{i+1} x_{k}} \cdot \sum_{m=1}^{i+1} \left(\frac{x_{m} \cdot \sum_{j=1}^{i+1} x_{j}}{x_{0}+x_{m}}\right) \\ & = 1 + \frac{1}{\sum_{k=0}^{i+1} x_{k}} \cdot \sum_{j=1}^{i+1} \left(\frac{x_{0} \cdot x_{j}}{x_{0}+x_{j}}\right) + \frac{1}{\sum_{k=0}^{i+1} x_{k}} \cdot \sum_{m=1}^{i+1} \left(\frac{x_{m} \cdot \sum_{j=0}^{i+1} x_{j} - x_{m}\cdot x_{0}}{x_{0}+x_{m}}\right) \\ & = 1 + \frac{1}{\sum_{k=0}^{i+1} x_{k}} \cdot \sum_{j=1}^{i+1} \left(\frac{x_{0} \cdot x_{j}}{x_{0}+x_{j}}\right) + \sum_{m=1}^{i+1} \left(\frac{x_{m}}{x_{0}+x_{m}}\right) - \frac{1}{\sum_{k=0}^{i+1} x_{k}} \cdot \sum_{m=1}^{i+1} \left(\frac{x_{0} \cdot x_{m}}{x_{0}+x_{m}}\right) \\ & = 1 + \sum_{m=1}^{i+1} \left(\frac{x_{m}}{x_{0}+x_{m}}\right) \end{aligned}

Hence, we have proved the above formula by induction. The time complexity to perform the BFS/DFS is \mathcal{O}(R\cdot C) and the expected value can be found in linear time in the number of components. We already showed above that \frac{R\cdot C}{2} is a loose upper bound for the maximum number of components. Hence, the problem can be solved in \mathcal{O}(R\cdot C) time.

COMPLEXITY:

\mathcal{O}(R\cdot C) per test case.

SAMPLE SOLUTIONS:

Author
Tester
Editorialist

Amlesh · September 14, 2015, 3:06pm

Awesome problem, this is definitely the most innovative version of this question I’ve seen. Kudos to the author!

Another way intuit to the solution is by using indicator variables. Let \chi_i = 1 if the first click in the i th component occurs before clicking anywhere in x_0 (and 0 otherwise). Thus we want to calculate 1 + \sum_{i=1}^{n} \chi_i (the initial 1 representing actually clicking x_0). Now notice that when calculating \chi_i you can ignore every cell that isn’t in x_i or x_0. Thus the problem reduces considerably: “What is the probability of first clicking on a cell in x_i before one in x_0?”. The answer to this is simply \frac{x_i}{x_0+x_i}, thus getting the required closed form of 1 + \sum_{i=1}^{n} \frac{x_i}{x_0+x_i}

additya1998 · September 14, 2015, 3:21pm

For the first sub-task, for every component, we can either choose than component or leave it. But we have to select the component containing the starting and ending point and apart from that component, we can choose the other components in any order, so I multiplied the value with the factorial of the number of components chosen, not including that component.

Can someone help me figure out what is wrong with the logic or the implementation since I got WA.

https://www.codechef.com/viewsolution/8122170

xellos0 · September 14, 2015, 3:24pm

By the way, did someone try filling the given maze? If you didn’t, you should.

amit2602 · September 15, 2015, 2:52am

can any1 tell me in which case i am failing
https://www.codechef.com/viewsolution/8170945

abscon · September 16, 2015, 3:27pm

@amit2602 : Your solution fails the test case


5 5
o#ooo
o####
o#oo#
o####
ooooo

The answer to the above test case is 1.431818 whereas your code outputs 1.950000. You can view my solution if needed : CodeChef: Practical coding for everyone

progkiller123 · September 17, 2015, 3:46pm

Isn’t the formaula for E(10010) would be this ?? Why Additional 1s are there in Original formula given above
E(10010)=(x0/x0+x2+x3)⋅(1)+(x2/x0+x2+x3)⋅E(10110)+(x3/x0+x2+x3)⋅E(11010)

Amlesh · September 14, 2015, 3:24pm

@additya1998 be very careful, as you aren’t accounting for the number of cells in each connected component! Now if each connected component has exactly 1 element, then using factorials to count the total number of possible click orderings would work for the first subtask, but this isn’t necessarily the case

additya1998 · September 14, 2015, 3:27pm

@Amlesh I am storing the number of cells in each component. Can you provide any case where my solution won’t work?

yash_15 · September 14, 2015, 11:39pm

cant we say that ∑χi<1 because χi = 1 for at most 1 i and for the rest it will be zero.
Actually, we want to calculate E(1+∑χi) which is equal to 1+∑E(χi)by linearity.

yash_15 · September 14, 2015, 11:40pm

^ x_i is actually zi_i ( the indicator variable)

Amlesh · September 15, 2015, 3:54am

So multiple \chi_i can be 1, right? Cause \chi_i just represents the first click in x_i is before the first click in x_0. Thus us being able to decouple them. Think about it this way, the problem becomes: “What is the probability of choosing a white sock before a red one, if there are W white socks and R red ones?” (which is merely \frac{W}{R+W})

lg5293 · September 15, 2015, 10:32am

That explains the title…