CLHMSG - Editorial



Author: Soumik Sarkar
Tester: Subham Das
Editorialist: Soumik Sarkar




String matching, Dynamic programming


The inputs are an encoding procedure, an encoded string S, and a plain string W. The procedure is to generate binary representations of the positions of each character in the alphabet, and then append them to form a single string. Applying the reverse of the encoding procedure is not guaranteed to produce a single plaintext. Find the number of times W occurs in all possible decoded strings that can be obtained from S.


Encode W to get W_e. Generate 1-dimensional dp arrays such that dp_{fwd}[i] is the number of decodings of S[i..len(S)] and dp_{bck}[i] is the number of decodings of S[1..i]. Locate positions where W_e occurs within S, store these in m. The answer is then \sum_{m_i \in m} dp_{bck}[m_i-1] \times dp_{fwd}[m_i+len(W_e)]


The description of the problem might be scary, but it is not a very hard problem. First, let us think about a simpler version of the problem. Given an encoded string S as above, how many different decoded strings can be obtained from it? This problem begs a dynamic programming solution.

Let dp[i] be the number of decoded strings that can be obtained from the substring S[i..len(S)]. Assume that we already know all dp[j] for j>i. Consider only the substring starting at i. What can be the first character of this substring? Well, it could be just the character whose binary is S[i]. Then dp[i+1] can be added to dp[i], since if we consider S[i] as a character, the number of decoded strings that can be formed in just dp[i+1]. What other choices do we have? The first two bits at i could also correspond to some alphabet. So considering S[i..(i+1)] as the first character, we add dp[i+2] to dp[i]. We next consider S[i..(i+2)], then S[i..(i+3)], and so on. We continue this until we exceed the length of S or can no longer form a valid alphabet, which happens when the binary value of our first character exceeds 26. Keep in mind that no character will begin with a leading 0, so if S[i] = 0 then dp[i] = 0.

Let’s get back to the problem at hand. If W appears as a substring in some decoded string, the encoded form of W must appear in S at some position. Let the encoded form of W be W_e. So, if W_e appears at position i in S, this means that some decoded strings of S may have W as a substring. How many of these substrings are there? It’s simply the number of decoded strings that can be formed from the substring S[1..(i-1)] times the same of S[(i+len(W_e))..len(S)].

This is why we require two dp arrays as described in the quick explanation. To find all occurences of the substring W_e in S, we can use any linear substring matching algoithm such as KMP or Z-algorithm.

The complexity of this approach is \mathcal{O}(k \cdot |S|) where k is the number of iterations to find each dp state in worst case, which is \lceil \log_2 26 \rceil = 5 here.


Author’s solution can be found here.