Hello, the following problem was asked in a Codenation test recently. I couldn’t make head or tail of the question. Please help me understand the problem and how to solve it. Thanks in advance.

Find the expected value of the number of segments in a string of length A in a language having alphabet size B.

A segment is defined as the maximum contiguous substring containing the same character. Eg. In string 10011. The segments are 1, 00 and 11. The number of segments will be 3.

Input format: The first argument is A and second argument is B.

Output format: Return the expected value of the number of segments. This can be represented in the form of x/y. Return x.y^(-1)(mid 10^9 + 7).

Example : A=1,B=2. Output is 1.
A=2,B=2. Output is 500000005.

The test was a week before so I am not cheating.

What are the constraints on A, B ?

1 Like

1 <= A,B <=10^9

I am sorry for not mentioning this before.

Ok i can give you an idea we can calculate the total sum of number of segments for all strings. We get the answer by dividing it by total number of possible strings . We iterate on number of positions(lets say x) in our string such that S[i] != S[i+1] where S is our string now we can calculate the answer for this x. It would be (x+1) * C(A-1,x) * pow(B-1,x) * (B) . condense this sum using binomial properties to get the answer in O(log(N)) .

1 Like

Let’s say you have the sequence X_1, X_2, X_3,....,X_A, clearly number of ways are B^A. Define variable Y_i as follows:
Y_1 = 1
Y_i = 1 if X_i and X_{i-1} are different and i > 1
Y_i = 0 otherwise
Therefore we need to find E(Y_1+Y_2+Y_3+....+Y_A) = 1 + (A-1)*Y_2, using linearity of expectation.
Now E(Y_2) = Pr(X_2 != X_1) = \frac{B*(B-1)}{B^A}*B^{A-2}, note that B^{A-2} is multiplied because all other elements are independent. Therefore final answer will be: 1 + \frac{(B-1)*(A-1)}{B}.

1 Like