You are not logged in. Please login at www.codechef.com to post your questions!

×

MKSTR - Editorial

Problem Link

Practice

Contest

Author: Ivan Fekete

Tester: Misha Chorniy

Editorialist: Bhuvnesh Jain

Difficulty

MEDIUM

Prerequisites

Dynamic Programming, Tries, Hashing

Problem

You are given a empty string $S$ and a taget string $T$. Youa re also provided with $N$ patterns $p[i]$. You may perform any number of operations as follows:

  1. Insert character 'x' to the left of the string $L$. Cost: cl[x] * |L|
  2. Insert character 'x' to the right of the string $L$. Cost: cr[x] * |L|
  3. Insert string 'p[i]' to the left of the string $L$. Cost: kl[i] * |L|
  4. Insert string 'p[i]' to the right of the string $L$. Cost: kr[i] * |L|

Find the minimum cost to make string $T$ from $S$ using the above operations. Note that |L| denotes the current length of the string $L$, before the operation is performed.

Explanation

The problem statement clearly lists some transitions and cost related to it and we would like to minimise the overall cost. So, this hints towards a dynamic programming approach.

Let us define $dp[i][j]$ as the minimum cost to make the string $T[i..j]$ using the above operations. The transitions are exactly the same as given in the statement. Let $L = (j - i + 1)$.

  1. dp[i+1][j] + cl[T[i]] * (L - 1)
  2. dp[i][j-1] + cr[T[j]] * (L - 1)
  3. dp[i+x][j] + kl[y] * (L - x), where $x$ is length of any string $p[y]$ such that $T[i..(i+x-1)] = p[y]$.
  4. dp[i][j-x] + kr[y] * (L - x), where $x$ is length of any string $p[y]$ such that $T[(j-x+1)..j] = p[y]$.

$dp[i][j]$ is clearly the minimum over all possible transitions. The time complexity of the above logic is $O(|S| * |S| * N * |P[i]|)$. This is enough to pass the first subtask.

But there is a small caveat while implementing the above solution. The most general way to implement dp solution is as follows:


    for i in [0, |S|-1]:
        for j in [i, |S|-1]:
            calculate dp[i][j] using all transitions

But the above implementation is wrong as the transitions for $dp[i][j]$ require us to have the correct values of all $dp[x][y]$ calculated where $i <= x <= y <= j$. So, we want to calculate the dp of smaller substrings first and then extend it to bigger substrings. A working pseudo-code for the first subtask is as follows:


    for len in [1, |T|]:
        # First find for subtrings of length 1, then 2 and so on.
        for i in [0, |T|-len]:
            j = i + len - 1
            res = dp[i+1][j] + cl[T[i]] * (len - 1)
            res = min(res, dp[i][j-1] + cr[T[j]] * (len - 1))
            for y in [1, n]:
                x = |p[y]|
                if x > len:
                    continue
                # attach left
                match = 1
                for l in [0, x-1]:
                    if p[y][l] != T[i+l]:
                        match = 0
                if match:
                    res = min(res, dp[i+x][j] + kl[y] * (len - x))
                match = 1
                for l in [0, x-1]:
                    if p[y][l] != T[j-x+1+l]:
                        match = 0
                if match:
                    res = min(res, dp[i][j-x] + kr[y] * (len - x))
            dp[i][j] = res
    print dp[0][|T|-1]

Full Solution:

Note that in the above solution, the only time consuming step is the transitions from all possible strings $p[y]$. Since we know that the maximum length of any string $p[y]$ is atmost 100, we see that $dp[i][j]$ would take atmost 102 transitions (2 for characters). If we can do each transition in $O(1)$ then we can completely solve the problem.

The first thing to observe is that we have atmost 100 different strings which could be appended to the left or right while calculating the cost of a substring. Instead of iterating over all possible patterns $p[y]$, we need to efficiently find the cost of making that substring (if possible) from the set of patterns. This requires efficient string matching from a set of patterns. Below are 2 different approaches to it:

Author's solution

Make $2$ tries of all the patterns, one for the left side transitions and another for the right side transitions. For each node in the trie also store the minimum cost of making that string. For all nodes, it is initially $INF$, some large value. Only for nodes where a pattern $p[y]$ ends, the cost is set to the minimum of $kl[y]$ or $kr[y]$ over all possible patterns ending there (Note that there can be multiple same patterns but with different cost). To efficiently search for the cost of the substring, we simply traverse the trie and after each transition update the $dp$ values.

The overall time complexity of the above solution is now $O(|S| * |S| * {max}*{i=1}^{i=N} |P[i]| + \sum*{i=1}^{i=N} |P[i]|)$, where the first part is for dynamic programming calculating and the second part is for building the trie. The space complexity of the solution is $O(|S| * |S| + |N| * |P| * 26)$, where the first part is for the dynamic programming table and second is for the trie of patterns.

Once, you are clear with the above idea, you can see the author implementation below for help.

Editorialist's solution

We can use hashing for string comparison as well. So, we hash all the patterns and store the cost of them in a map. We also store the hash of text $T$ as well. Now, we calculate the cost of every substring as follows:


    # H[S] denotes the hash of stirng S.
    # mp[H[P]] stores the cost for pattern P.
    for i in [0, |S|-1]:
        for j in [i, |S|-1]:
            w = H[S[i..j]]
            if w is present in mp:
                cost_l[i][j] = mp[H[P]].first
                cost_r[i][j] = mp[H[P]].second
            else:
                cost_l[i][j] = INF
                cost_r[i][j] = INF

The time complexity for the above approach is $O(\sum_{i=1}^{i=N} P[i] + |S| + |S| * |S| * \log{N} + |S| * |S| * |P|)$, where the first $2$ parts are for building the hashes, the third for the above pseudo-code ($\log{N}$ for searching the map) and last part for dynammic programming approach. The space complexity of the approach is $O(|S| * |S|)$ which is required for storing the dynammic programming table and also the cost of every substring.

NOTE: For the hash based solution the string lengths are small and also the number of strings to consider i.e. $N$ are large so there can be chances of collision. So, it is preferred to use double hashing in this problem. For details refer to this blog on codeforces.

Once, you are clear with the above idea, you can see the editorialist implementation below for help.

Feel free to share your approach, if it was somewhat different.

Time Complexity

$O(|S| * |S| * {max}*{i=1}^{i=N} |P[i]| + \sum*{i=1}^{i=N} |P[i]|)$ or $O(\sum_{i=1}^{i=N} P[i] + |S| + |S| * |S| * \log{N} + |S| * |S| * |P|)$

Space Complexity

$O(|S| * |S| + |N| * |P| * 26)$ or $O(|S| * |S|)$

AUTHOR'S AND TESTER'S SOLUTIONS:

Author's solution can be found here.

Tester's solution can be found here.

Editorialist's solution can be found here.

This question is marked "community wiki".

asked 28 Jul '18, 16:58

likecs's gravatar image

6★likecs
3.7k2481
accept rate: 9%

edited 29 Jul '18, 19:43


In Author's solution maybe correct will be O(S * S * max(|P[i]|)), because from every [l,r] there we will go for whole tries in worse case?

link

answered 29 Jul '18, 04:57

batura_dima's gravatar image

5★batura_dima
1264
accept rate: 5%

1

@batura_dima you are correct. Maybe I should have mentioned what I meant by |P| as it clearly is not the size of one particular string.

(29 Jul '18, 12:51) likecs6★

As P is array, so usually |P| means size of array. I thought there are speeded up traverse of trie, but as I see it is usual.

We can achieve O(S * S * |P|) for main calculation, if we preprocessed for every position in T set of reachable P[i], so for every i we will only once do traverse of trie (in both directions of course). So it will be O(S * S * |P| + S * max(|P[i]|)).

(29 Jul '18, 14:34) batura_dima5★

@batura_dima, how do you conclude $O(S * S * |P|)$ for main calculation? Btw I have updated complexity section in the editorial.

(29 Jul '18, 19:41) likecs6★

I've described untidily.

For every char in T we will preprocess set of reachable P[i] (in both directions of course). It will take O(S * max(|P[i]|)).

Then for every [l,r] we will already have possible transitions by strings from P.

link

answered 30 Jul '18, 04:15

batura_dima's gravatar image

5★batura_dima
1264
accept rate: 5%

It seems that in solution with hash time complexity for main calculation is O(S * S * max(|P[i]|) * lg(|P|)). Because of same reason. For every [l,r] we go to distance max(|P[i]|).

link

answered 30 Jul '18, 04:28

batura_dima's gravatar image

5★batura_dima
1264
accept rate: 5%

@batura_dima updated. Where do you get the additional $O(\log{|P|}$ factor from?

(30 Jul '18, 10:22) likecs6★

You wrote yourself: "(logN for searching the map)"

(30 Jul '18, 15:24) batura_dima5★

That part comes in precomputation where cost of every substring is stored. If you do it during each DP calculation, it will lead to TLE as the number of operations would be $1000 * 1000 * 100 * \log(10000)$. Also, the factor is $O(\log N)$ where N is the number of string whose hash value is added in the map, not $O(\log(|P|)$. You can refer to my code if something is not clear.

(30 Jul '18, 16:36) likecs6★
toggle preview
Preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported
  • mathemetical formulas in Latex between $ symbol

Question tags:

×15,852
×2,658
×2,214
×593
×81
×51
×47
×24

question asked: 28 Jul '18, 16:58

question was seen: 482 times

last updated: 30 Jul '18, 04:28