PROBLEM LINKSDIFFICULTYMEDIUMHARD PREREQUISITESKnuthMorrisPratt String Matching Algorithm, Dynamic Programming, Simple Probability PROBLEMChef likes to toss a pizza many times. The sides of the pizza are denoted by F (the front side) and B (the back side). Each side has the same probability of showing up after each toss. A sequence of tosses can be represented by a string over alphabet {F, B}, where each character represents the side that shows up in the corresponding toss. You are given a string S1 of length N. This is called the lucky sequence. You are also given a string S2 of length M. This string represents the sequence of tosses Chef has made today. Count the expected number of additional tosses Chef must make until the string representing the last N tosses is equal to S1. If Chef has already made a lucky sequence today, the answer is 0. QUICK EXPLANATIONUse dynamic programming approach. The state is E[k] = the expected number of additional tosses when the last k tosses match the first k tosses in the lucky sequence. EXPLANATIONFor simplicity, we will assume that the strings S1 and S2 has been decoded. We will use KMP String Matching Algorithm extensively in the solution. Specifically, we will use the socalled "failure function" feature from KMP. So, let's recall how failure function works. Consider a string S of length N. For convenience, we will use 1based indexing. We will use notation S[a..b] to denote the contiguous substring of S from the ath character to the bth character, inclusive. Let's introduce fail[] array. fail[i] is equal to k, where k is the maximum integer such that S[1..k] = S[ik+1..i]. In other words, fail[i] is the length of the longest prefix of S[1..i] that is also a suffix of it. This failure function table can be computed in O(N). The pseudocode is as follows. fail[1] = 0 k = 0 for i = 2; i <= N; i++: while k > 0 and S[k+1] != S[i]: k = fail[k] if S[k+1] == S[i]: k = k + 1 fail[i] = k Note that in some KMP references, k must be less than i (i.e., both the prefix and suffix must be proper). For the purpose of this problem, we will not use that restriction. Now, let's solve the problem. First, apply the failure function to S1 as fail_S1[]. Then, apply the failure function to S2 as fail_S2[] as well, but by comparing its characters to S1 as follows. In other words, fail_S2[i] = the maximum integer k such that S1[1..k] = S2[ik+1..i]. k = 0 for i = 1; i <= M; i++: while k > 0 and S1[k+1] != S2[i]: k = fail_S2[k] if S1[k+1] == S2[i]: k = k + 1 fail_S2[i] = k if k == N: k = fail_S2[k] There are two possible cases. Case 1. The string S1 is a substring of S2. We can check this by utilizing the failure function we just built and use KMP to find the match. More specifically, we check whether there is integer k such that fail_S2[k] = N. The answer for this case is 0, since Chef has already have the lucky sequence and therefore does not have to make any additional tosses. Case 2. The string S1 is not a substring of S2. We will use dynamic programming approach in this case. Let E[k] = the expected number of additional tosses Chef must make if the string denoting the last k tosses of the current sequence is equal to S1[1..k]. The answer to the original problem would be E[fail_S2[M]]. The base case of this DP solution is E[N] = 0, i.e., if the string denoting the last N tosses of the current sequence = S1[1..N] = S1, then Chef has made the lucky sequence and he should stop tossing pizzas. Let's define the recurrence relation for E[k], where k < N. Chef must make at least one additional toss. Let C be the side of the pizza that shows up on the next toss. There are two cases:
Therefore, E[k] = (2 + E[k+1] + E[rematch[k]]) / 2. We can rewrite this equation into: 2E[k] = 2 + E[k+1] + E[rematch[k]] Because both k and rematch[k] are not greater than k+1, we have a noncircular function definition here. We will solve the equations by expressing each E[k] as P(k) + Q(k)E[0]. We will solve for P(k) and Q(k). First, because E[1] = 2 × E[0]  E[0]  2, then P(1) = 2 and Q(1) = 1. Let t = rematch[k]. For k > 1, E[k+1] = 2E[k]  E[t]  2 Therefore, P(k+1) = 2P(k)  P(t)  2 Having able to calculate P() and Q(), now we have this ultimate equation: E[N] = P(N) + Q(N)E[0] and since we know that E[N] = 0, E[0] = P(N) / Q(N) Finally, the answer to this problem is E[fail_S2[M]] = P(fail_S2[M]) + Q(fail_S2[M])E[0]. The time complexity of this solution is O(N). Once more, do not forget to perform all calculations modulo 1,000,000,0007. SETTER'S SOLUTIONCan be found here TESTER'S SOLUTIONCan be found here
This question is marked "community wiki".
asked 24 Sep '12, 00:59

It can be proved that E[0] = sum of 2^k over those values of k for which S1[1..k] = S1[Nk+1..N]In fact the special case of this problem when M=0 was proposed earlier at Petrozavodsk summer training camp (August 2008) but with arbitrary size of alphabet. You can try this version here.
I solved this problem long time ago and use this fact to simplify calculations in this problem.
That is calculate Also instead of KMP one can use Zalgorithm which allows to do the same in linear time. See my solution for details. answered 24 Sep '12, 01:10
Actually my setter's solution (not explained in the editorial) does use this recurrence.
(24 Sep '12, 01:17)
I tried using E[0]= 2^n2 (expected number of tosses of a fair coin to get (n1) consecutive heads). Can you sketch a proof for your E[0] formula?
(24 Sep '12, 01:30)

The sketch of the proof of the explicit formula for E[0]: E[0] = sum of 2^k over those values of k for which S1[1..k] = S1[Nk+1..N]. By definition of expected value E[0] = sum (k * F(k) / 2^k : 1 <= k < infty ),where F(k) is the number of binary sequences of length k for which S1 firstly occurs at positions kN+1,...,k .
Clearly Denote by Then by basic combinatorial and probabilistic reasoning it can be proved that F(k) = 2 * F(k1)  F(kN) + sum (2 * F(kN+i1)  F(kN+i) : i in Pi(S1)), k>N Which can be written as F(k) = a(1) * F(k1) + ... + a(N) * F(kN). Now using basic theory of generating functions we have F(x) = sum (F(k) * x^k : 1 <= k < infty ) = x^N / (1  a(1)x  ...  a(N)x^N) Then differentiating this by E[0] = x * F'(x) at x=1/2 As was said by Igor Chevdar in the video of the explanation of the mentioned above problem from Timus after many many many transformations we get the required formula :) BTW, this sketch is based completely on this video. answered 24 Sep '12, 02:01
@anton_lunyov which video are you mentioning? please provide a link.
(07 Oct '12, 17:02)
It is provided only to Petrozavodsk trainig Camp contestants and not available in the internet. Moreover, it's in Russian :)
(07 Oct '12, 20:04)

A slight observation on the editorial solution.
From the above, you can see that the "coefficient" of E[0] will always be 1. i.e. Q(k) = 1 for all k. Hence, you can consider just finding values for P(k), and then your solution for E[k] = P(k)  P(N), since we need E[N] to be 0. Its like just translating the values of P(k) by P(N), once you have assumed that P(0) = 0. answered 24 Sep '12, 17:37

You need to compute value answered 24 Sep '12, 04:40

Is the fail_S2 pseudocode correct? Shouldn't the first character of S2 be compared? If N=M=1 and it's the same letter, I think it'll fail. Or any case where the first letter of S2 is the same as the first as S1, the loop will start comparing S2[2] with S1[1] answered 24 Sep '12, 06:00

How can the initial string be included in this method? If the initial string is blank we can hash the values and find how many prefixes of string are equal to suffixes of string of the same length and sum 2^[length]... Is KMP a sure shot requirement for this problem? answered 26 Sep '12, 14:28
I read at Codeforces that actually the answer is E[0,S1]  E[0,P], where P is the largest prefix of S1 that is suffix of S2. Probably it follows somehow from @pragrame remark. Clearly P also can be calculated in O(N) time using hashes. So indeed the problem can be solved in O(N) using hashes without KMP (or Zalgo).
(26 Sep '12, 14:44)

Hi, it would be more helpful if you can discuss this rematch[] function more. I cannot understand what would be the value of C here if i make such a function. I have thought over it, it can be either 0 or 1. so if S1[q+1] is 0 then C would be 1 and vice versa . Am i right ? answered 24 Nov '12, 19:20
