MKSTR - Editorial

likecs · July 28, 2018, 4:58pm

Problem Link

Practice

Contest

Author: Ivan Fekete

Tester: Misha Chorniy

Editorialist: Bhuvnesh Jain

Difficulty

MEDIUM

Prerequisites

Dynamic Programming, Tries, Hashing

Problem

You are given a empty string S and a taget string T. Youa re also provided with N patterns p[i]. You may perform any number of operations as follows:

Insert character ‘x’ to the left of the string L. Cost: cl[x] * |L|
Insert character ‘x’ to the right of the string L. Cost: cr[x] * |L|
Insert string ‘p[i]’ to the left of the string L. Cost: kl[i] * |L|
Insert string ‘p[i]’ to the right of the string L. Cost: kr[i] * |L|

Find the minimum cost to make string T from S using the above operations. Note that |L| denotes the current length of the string L, before the operation is performed.

Explanation

The problem statement clearly lists some transitions and cost related to it and we would like to minimise the overall cost. So, this hints towards a dynamic programming approach.

Let us define dp[i][j] as the minimum cost to make the string T[i..j] using the above operations. The transitions are exactly the same as given in the statement. Let L = (j - i + 1).

dp[i+1][j] + cl[T[i]] * (L - 1)
dp[i][j-1] + cr[T[j]] * (L - 1)
dp[i+x][j] + kl[y] * (L - x), where x is length of any string p[y] such that T[i..(i+x-1)] = p[y].
dp[i][j-x] + kr[y] * (L - x), where x is length of any string p[y] such that T[(j-x+1)..j] = p[y].

dp[i][j] is clearly the minimum over all possible transitions. The time complexity of the above logic is O(|S| * |S| * N * |P[i]|). This is enough to pass the first subtask.

But there is a small caveat while implementing the above solution. The most general way to implement dp solution is as follows:


	for i in [0, |S|-1]:
		for j in [i, |S|-1]:
			calculate dp[i][j] using all transitions

But the above implementation is wrong as the transitions for dp[i][j] require us to have the correct values of all dp[x][y] calculated where i <= x <= y <= j. So, we want to calculate the dp of smaller substrings first and then extend it to bigger substrings. A working pseudo-code for the first subtask is as follows:


	for len in [1, |T|]:
		# First find for subtrings of length 1, then 2 and so on.
		for i in [0, |T|-len]:
			j = i + len - 1
			res = dp[i+1][j] + cl[T[i]] * (len - 1)
			res = min(res, dp[i][j-1] + cr[T[j]] * (len - 1))
			for y in [1, n]:
				x = |p[y]|
				if x > len:
					continue
				# attach left
				match = 1
				for l in [0, x-1]:
					if p[y][l] != T[i+l]:
						match = 0
				if match:
					res = min(res, dp[i+x][j] + kl[y] * (len - x))
				match = 1
				for l in [0, x-1]:
					if p[y][l] != T[j-x+1+l]:
						match = 0
				if match:
					res = min(res, dp[i][j-x] + kr[y] * (len - x))
			dp[i][j] = res
	print dp[0][|T|-1]

Full Solution:

Note that in the above solution, the only time consuming step is the transitions from all possible strings p[y]. Since we know that the maximum length of any string p[y] is atmost 100, we see that dp[i][j] would take atmost 102 transitions (2 for characters). If we can do each transition in O(1) then we can completely solve the problem.

The first thing to observe is that we have atmost 100 different strings which could be appended to the left or right while calculating the cost of a substring. Instead of iterating over all possible patterns p[y], we need to efficiently find the cost of making that substring (if possible) from the set of patterns. This requires efficient string matching from a set of patterns. Below are 2 different approaches to it:

Author’s solution

Make 2 tries of all the patterns, one for the left side transitions and another for the right side transitions. For each node in the trie also store the minimum cost of making that string. For all nodes, it is initially INF, some large value. Only for nodes where a pattern p[y] ends, the cost is set to the minimum of kl[y] or kr[y] over all possible patterns ending there (Note that there can be multiple same patterns but with different cost). To efficiently search for the cost of the substring, we simply traverse the trie and after each transition update the dp values.

The overall time complexity of the above solution is now O(|S| * |S| * \max_{i=1}^{i=N} |P[i]| + \sum_{i=1}^{i=N} |P[i]|), where the first part is for dynamic programming calculating and the second part is for building the trie. The space complexity of the solution is O(|S| * |S| + |N| * |P| * 26), where the first part is for the dynamic programming table and second is for the trie of patterns.

Once, you are clear with the above idea, you can see the author implementation below for help.

Editorialist’s solution

We can use hashing for string comparison as well. So, we hash all the patterns and store the cost of them in a map. We also store the hash of text T as well. Now, we calculate the cost of every substring as follows:


	# H[S] denotes the hash of stirng S.
	# mp[H[P]] stores the cost for pattern P.
	for i in [0, |S|-1]:
		for j in [i, |S|-1]:
			w = H[S[i..j]]
			if w is present in mp:
				cost_l[i][j] = mp[H[P]].first
				cost_r[i][j] = mp[H[P]].second
			else:
				cost_l[i][j] = INF
				cost_r[i][j] = INF

The time complexity for the above approach is O(\sum_{i=1}^{i=N} P[i] + |S| + |S| * |S| * \log{N} + |S| * |S| * \max_{i=1}^{i=N} |P[i]|), where the first 2 parts are for building the hashes, the third for the above pseudo-code (\log{N} for searching the map) and last part for dynammic programming approach. The space complexity of the approach is O(|S| * |S|) which is required for storing the dynammic programming table and also the cost of every substring.

NOTE: For the hash based solution the string lengths are small and also the number of strings to consider i.e. N are large so there can be chances of collision. So, it is preferred to use double hashing in this problem. For details refer to this blog on codeforces.

Once, you are clear with the above idea, you can see the editorialist implementation below for help.

Feel free to share your approach, if it was somewhat different.

Time Complexity

O(|S| * |S| * \max_{i=1}^{i=N} |P[i]| + \sum_{i=1}^{i=N} |P[i]|) or O(\sum_{i=1}^{i=N} P[i] + |S| + |S| * |S| * \log{N} + |S| * |S| * \max_{i=1}^{i=N} |P[i]|)

Space Complexity

O(|S| * |S| + |N| * |P| * 26) or O(|S| * |S|)

AUTHOR’S AND TESTER’S SOLUTIONS:

Author’s solution can be found here.

Tester’s solution can be found here.

Editorialist’s solution can be found here.

batura_dima · July 29, 2018, 4:57am

In Author’s solution maybe correct will be O(S * S * max(|P[i]|)), because from every [l,r] there we will go for whole tries in worse case?

batura_dima · July 30, 2018, 4:15am

I’ve described untidily.

For every char in T we will preprocess set of reachable P[i] (in both directions of course). It will take O(S * max(|P[i]|)).

Then for every [l,r] we will already have possible transitions by strings from P.

batura_dima · July 30, 2018, 4:28am

It seems that in solution with hash time complexity for main calculation is O(S * S * max(|P[i]|) * lg(|P|)). Because of same reason. For every [l,r] we go to distance max(|P[i]|).

likecs · July 29, 2018, 12:51pm

@batura_dima you are correct. Maybe I should have mentioned what I meant by |P| as it clearly is not the size of one particular string.

batura_dima · July 29, 2018, 2:34pm

As P is array, so usually |P| means size of array.
I thought there are speeded up traverse of trie, but as I see it is usual.

We can achieve O(S * S * |P|) for main calculation, if we preprocessed for every position in T set of reachable P[i], so for every i we will only once do traverse of trie (in both directions of course). So it will be O(S * S * |P| + S * max(|P[i]|)).

likecs · July 29, 2018, 7:41pm

@batura_dima, how do you conclude O(S * S * |P|) for main calculation? Btw I have updated complexity section in the editorial.

likecs · July 30, 2018, 10:22am

@batura_dima updated. Where do you get the additional O(\log{|P|} factor from?

batura_dima · July 30, 2018, 3:24pm

You wrote yourself: “(logN for searching the map)”

likecs · July 30, 2018, 4:36pm

That part comes in precomputation where cost of every substring is stored. If you do it during each DP calculation, it will lead to TLE as the number of operations would be 1000 * 1000 * 100 * \log(10000). Also, the factor is O(\log N) where N is the number of string whose hash value is added in the map, not O(\log(|P|). You can refer to my code if something is not clear.