PTOSS - Editorial

fushar · September 24, 2012, 12:59am

PROBLEM LINKS

DIFFICULTY

MEDIUM-HARD

PREREQUISITES

Knuth-Morris-Pratt String Matching Algorithm, Dynamic Programming, Simple Probability

PROBLEM

Chef likes to toss a pizza many times. The sides of the pizza are denoted by F (the front side) and B (the back side). Each side has the same probability of showing up after each toss. A sequence of tosses can be represented by a string over alphabet {F, B}, where each character represents the side that shows up in the corresponding toss.

You are given a string S1 of length N. This is called the lucky sequence. You are also given a string S2 of length M. This string represents the sequence of tosses Chef has made today. Count the expected number of additional tosses Chef must make until the string representing the last N tosses is equal to S1. If Chef has already made a lucky sequence today, the answer is 0.

QUICK EXPLANATION

Use dynamic programming approach. The state is E[k] = the expected number of additional tosses when the last k tosses match the first k tosses in the lucky sequence.

EXPLANATION

For simplicity, we will assume that the strings S1 and S2 has been decoded.

We will use KMP String Matching Algorithm extensively in the solution. Specifically, we will use the so-called “failure function” feature from KMP. So, let’s recall how failure function works.

Consider a string S of length N. For convenience, we will use 1-based indexing. We will use notation S[a…b] to denote the contiguous substring of S from the a-th character to the b-th character, inclusive.

Let’s introduce fail[] array. fail[i] is equal to k, where k is the maximum integer such that S[1…k] = S[i-k+1…i]. In other words, fail[i] is the length of the longest prefix of S[1…i] that is also a suffix of it. This failure function table can be computed in O(N). The pseudocode is as follows.

fail[1] = 0
k = 0
for i = 2; i <= N; i++:
    while k > 0 and S[k+1] != S[i]:
        k = fail[k]
    if S[k+1] == S[i]:
        k = k + 1
    fail[i] = k

Note that in some KMP references, k must be less than i (i.e., both the prefix and suffix must be proper). For the purpose of this problem, we will not use that restriction.

Now, let’s solve the problem. First, apply the failure function to S1 as fail_S1[]. Then, apply the failure function to S2 as fail_S2[] as well, but by comparing its characters to S1 as follows. In other words, fail_S2[i] = the maximum integer k such that S1[1…k] = S2[i-k+1…i].

k = 0
for i = 1; i <= M; i++:
    while k > 0 and S1[k+1] != S2[i]:
        k = fail_S2[k]
    if S1[k+1] == S2[i]:
        k = k + 1
    fail_S2[i] = k
    if k == N:
        k = fail_S2[k]

There are two possible cases.

Case 1. The string S1 is a substring of S2. We can check this by utilizing the failure function we just built and use KMP to find the match. More specifically, we check whether there is integer k such that fail_S2[k] = N. The answer for this case is 0, since Chef has already have the lucky sequence and therefore does not have to make any additional tosses.

Case 2. The string S1 is not a substring of S2. We will use dynamic programming approach in this case.

Let E[k] = the expected number of additional tosses Chef must make if the string denoting the last k tosses of the current sequence is equal to S1[1…k]. The answer to the original problem would be E[fail_S2[M]].

The base case of this DP solution is E[N] = 0, i.e., if the string denoting the last N tosses of the current sequence = S1[1…N] = S1, then Chef has made the lucky sequence and he should stop tossing pizzas.

Let’s define the recurrence relation for E[k], where k < N. Chef must make at least one additional toss. Let C be the side of the pizza that shows up on the next toss. There are two cases:

C is equal to S1[k+1]. Therefore, the expected number of additional tosses Chef must make is 1 + E[k+1].
C is not equal to S1[k+1]. Let q be the maximum integer such that S1[k-q+1..k] + C = S1[1..k+1]. The value of q can be traced from fail_S2[] as follows:
```
q = k
while q > 0 and S1[q+1] != C:
    q = fail[q]
```
The expected number of additional tosses Chef must make is 1 + E[q+1]. For convenience, let rematch[k] = q+1. Note that rematch[0] = 0; this is a special case.

Therefore, E[k] = (2 + E[k+1] + E[rematch[k]]) / 2. We can rewrite this equation into:

2E[k] = 2 + E[k+1] + E[rematch[k]]

E[k+1] = 2E[k] - E[rematch[k]] - 2

Because both k and rematch[k] are not greater than k+1, we have a noncircular function definition here. We will solve the equations by expressing each E[k] as P(k) + Q(k)E[0]. We will solve for P(k) and Q(k).

First, because E[1] = 2 × E[0] - E[0] - 2, then P(1) = -2 and Q(1) = 1. Let t = rematch[k]. For k > 1,

E[k+1] = 2E[k] - E[t] - 2

= 2(P(k) + Q(k)E[k]) - (P(t) + Q(t)E[t]) - 2

= 2P(k) + 2Q(k)E[0] - P(t) - Q(t)E[0] - 2

= (2P(k) - P(t) - 2) + (2Q(k) - Q(t))E[0]

Therefore,

P(k+1) = 2P(k) - P(t) - 2

Q(k+1) = 2Q(k) - Q(t)

Having able to calculate P() and Q(), now we have this ultimate equation:

E[N] = P(N) + Q(N)E[0]

and since we know that E[N] = 0,

E[0] = -P(N) / Q(N)

(the division should be taken as inverse multiplication modulo 1,000,000,007.)

Finally, the answer to this problem is E[fail_S2[M]] = P(fail_S2[M]) + Q(fail_S2[M])E[0].

The time complexity of this solution is O(N).

Once more, do not forget to perform all calculations modulo 1,000,000,0007.

SETTER’S SOLUTION

Can be found here

TESTER’S SOLUTION

Can be found here

anton_lunyov · September 24, 2012, 1:10am

It can be proved that

E[0] = sum of 2^k over those values of k for which S1[1..k] = S1[N-k+1..N]

In fact the special case of this problem when M=0 was proposed earlier at Petrozavodsk summer training camp (August 2008) but with arbitrary size of alphabet. You can try this version here.

I solved this problem long time ago and use this fact to simplify calculations in this problem.
That is calculate E[k+1] directly by recurrence 2*E[k]-E[rematch[k]]-2.

Also instead of KMP one can use Z-algorithm which allows to do the same in linear time.
See my solution for details.

anton_lunyov · September 24, 2012, 2:01am

The sketch of the proof of the explicit formula for E[0]:

E[0] = sum of 2^k over those values of k for which S1[1..k] = S1[N-k+1..N].

By definition of expected value

E[0] = sum (k * F(k) / 2^k : 1 <= k < infty ),

where F(k) is the number of binary sequences of length k for which S1 firstly occurs at positions k-N+1,...,k.

Clearly F(1) = ... = F(N-1) = 0, F(N) = 1.

Denote by Pi(S1) the set of all k mentioned in the formula.

Then by basic combinatorial and probabilistic reasoning it can be proved that

F(k) = 2 * F(k-1) - F(k-N) + sum (2 * F(k-N+i-1) - F(k-N+i) : i in Pi(S1)), k>N

Which can be written as

F(k) = a(1) * F(k-1) + ... + a(N) * F(k-N).

Now using basic theory of generating functions we have

F(x) = sum (F(k) * x^k : 1 <= k < infty ) = x^N / (1 - a(1)*x - ... - a(N)*x^N)

Then differentiating this by x we obtain

E[0] = x * F'(x) at x=1/2

As was said by Igor Chevdar in the video of the explanation of the mentioned above problem from Timus after many many many transformations we get the required formula

BTW, this sketch is based completely on this video.

thocevar · September 24, 2012, 4:40am

The time complexity of this solution is O(N).

You need to compute value q for every k<N. If you do that by tracing over array fail as outlined in the pseudocode, you will end up with a O(N^2) solution. Check problem setter’s strong_kmp() function to see how it can be done in linear time.

bcurcio · September 24, 2012, 6:00am

Is the fail_S2 pseudocode correct? Shouldn’t the first character of S2 be compared? If N=M=1 and it’s the same letter, I think it’ll fail. Or any case where the first letter of S2 is the same as the first as S1, the loop will start comparing S2[2] with S1[1]

pragrame · September 24, 2012, 5:37pm

A slight observation on the editorial solution.

We will solve the equations by expressing each E[k] as P(k) + Q(k)E[0]. We will solve for P(k) and Q(k).
…
Q(k+1) = 2Q(k) - Q(t)

From the above, you can see that the “coefficient” of E[0] will always be 1. i.e. Q(k) = 1 for all k. Hence, you can consider just finding values for P(k), and then your solution for E[k] = P(k) - P(N), since we need E[N] to be 0. Its like just translating the values of P(k) by -P(N), once you have assumed that P(0) = 0.

calc_saransh · September 26, 2012, 2:28pm

@anton_lunyov

How can the initial string be included in this method? If the initial string is blank we can hash the values and find how many prefixes of string are equal to suffixes of string of the same length and sum 2^[length]… Is KMP a sure shot requirement for this problem?

javadecoder · October 8, 2012, 1:10am

A similar problem SPOJ Coin Tossing

ashishnegi001 · November 24, 2012, 7:20pm

Hi, it would be more helpful if you can discuss this rematch[] function more. I cannot understand what would be the value of C here if i make such a function. I have thought over it, it can be either 0 or 1.

so if S1[q+1] is 0 then C would be 1 and vice versa . Am i right ?

yellow_agony · September 24, 2012, 1:17am

Actually my setter’s solution (not explained in the editorial) does use this recurrence.

bcurcio · September 24, 2012, 1:30am

I tried using E[0]= 2^n-2 (expected number of tosses of a fair coin to get (n-1) consecutive heads). Can you sketch a proof for your E[0] formula?

fushar · September 25, 2012, 3:31pm

This is fixed, thanks.

anton_lunyov · September 26, 2012, 2:44pm

I read at Codeforces that actually the answer is E[0,S1] - E[0,P], where P is the largest prefix of S1 that is suffix of S2. Probably it follows somehow from @pragrame remark. Clearly P also can be calculated in O(N) time using hashes. So indeed the problem can be solved in O(N) using hashes without KMP (or Z-algo).

kriateive · October 7, 2012, 5:02pm

@anton_lunyov which video are you mentioning? please provide a link.

anton_adm · October 7, 2012, 8:04pm

It is provided only to Petrozavodsk trainig Camp contestants and not available in the internet. Moreover, it’s in Russian