PATTMATC - Editorial

editorial
kmp
ltime28
medium-hard
pattern
string

#1

PROBLEM LINK:

Practice
Contest

Author: Sergey Kulik
Tester: Yanpei Liu
Editorialist: Pawel Kacprzak

DIFFICULTY:

Medium-Hard

PREREQUISITES:

Strings, Pattern matching, KMP

PROBLEM:


The problem is stated really simple. You are given a string $S$ of $N$ lowercase Latin letters. In addition, you are given a **wildcard** string $T$, of length $M$, consisting of lowercase Latin characters and asterisk symbols '*'.

We said that T generates a string P, if and only if, P can be obtained from T by replacing all the asterisks with any, even empty, lowercase Latin letter strings.

The task is, for each suffix S[i..N] of S, compute the position of the first occurrence in this suffix, of a string P, which can be generated from T.

QUICK EXPLANATION:


Split $T$ on asterisks to get a list $L$ of patterns. Find occurences of all patterns from $L$ in $S$. For each suffix $S[i..N]$ of $S$, find in $S[i..N]$, the leftmost chain of non overlapping occurrences of all consecutive patterns from $L$.

EXPLANATION:


The first observations, which we can make, is that, we can replace any run of more than one asterisk with a single one.

After that, we can split T on asterisks and get a list L of K patterns, where K is not greater that the number of asterisks + 1.

The general idea is to use a pattern matching algorithm to find all occurrences of each pattern from L in S. Since we know that the number of asterisks is at most 500, we can compute the array A*[j], where A*[j] = b, if and only if, the first occurrence of the j^{th} pattern from L in S[i..N] is at index b in S.

Based on the above idea, we can iterate over all suffixes of S, and for each of them, using the A array, we can easily compute the leftmost, non-overlapping occurrences of all consecutive patterns from L, which is equivalent to the leftmost occurrence of a string generated by wildcard string T.

Time Complexity


Notice that, the time complexity of this approach depends on the method used to find occurrences of patterns from $L$ in $S$.

Naive pattern matching can be sufficient to pass some subtasks, but if you want a full credit, and we hope you do, you have to use any linear pattern matching algorithm, like KMP for example. If you do that, the total time complexity of a single test case is O(K \cdot N), because we use linear pattern matching to find occurrences of K patterns in a string of length N, and next, for each of N suffixes, we make at most K jumps over A array.

AUTHOR’S AND TESTER’S SOLUTIONS:

Author’s solution can be found here.
Tester’s solution can be found here.

RELATED PROBLEMS:


#2

Hello, I want to ask whether it is possible to use regex to solve this problem. Another question is, how fast is regex in matching strings compared to algorithms like KMP? Thanks


#3

In this case:

1 ab abacb
I got this: 2 5 5 -1 -1 from tester and Author solution, but I think
for i = 2, you can’t get s[2, 5]=bacb from a * b, because there are
not '
’ at the begining of T

The answer should be 2 -1 5 -1 -1 , anyone?


#4

To previus: It is because the string “bacb” CONTAINS the string generated from “a*b” (which is “acb”). So 5 should be a correct answer at i=2. I had the same mistake as you with this problem during the contest and it took me about an hour before I can realize that :frowning:


#5

Please rejudge the solutions


#6

Obviously depends on the regexp implementation you use.
Since they are much more general they will propably not as fast as the special algo employed here. But it could be fast enough to pass first subtasks – try it.