ANKPAL - Editorial

kevinsogo · June 12, 2015, 5:46am

PROBLEM LINK:

Author: Ankit Srivastava
Tester: Kevin Atienza and Gedi Zheng
Editorialist: Kevin Atienza

PREREQUISITES:

String processing, string polynomial hashing

PROBLEM:

You are given a string S, and you have to answer Q queries. Each query consists of four integers (i,j,k,l), and you need to tell whether the substring from the k th position to the l th position is a palindrome assuming the substring from the i th position to the j th position is reversed temporarily (i.e. reversals are only applied per query and do not persist for further queries).

QUICK EXPLANATION:

Fix a (prime) modulus M (preferably between 10^9 and 2\cdot 10^9), and a base B (preferably a generator modulo M) and define the polynomial hash of the string S = S_0S_1\cdots S_{n-1} as the following:

\text{hash}(S) = \left(\sum_{i=0}^{n-1} H[S_i]B^i\right) \bmod M

where H[c] is some fixed value from 0 to M-1 assigned to c.

If M and B are chosen well, two strings S and T are equal if and only if \text{hash}(S) = \text{hash}(T) with high probability, and S is a palindrome iff \text{hash}(S) = \text{hash}(S^R) with high probability (S^R is the reversal of S).

If we are given two strings S, T and their hashes, then the hash of ST (their concatenation) can easily be computed in O(1) time (assuming we have precomputed the powers of B).

Preprocess S so that we can compute the polynomial hash of any substring and reversal of any substring in O(1) time.

Next, to answer the query (i,j,k,l), first check if [k,l] intersects with [i,j]. If not, we can simply compute the hash of S[k,l] and its reverse. Otherwise, split [k,l] into three parts, where the middle part is contained completely in [i,j]. The hashes of the three parts (and their reverses) can be computed separately (with the middle one requiring special care), and can be concatenated in O(1) time and compared. If the hash of the string and its reverse match, then the answer is “Yes”, otherwise it is “No”.

EXPLANATION:

When faced with a problem, there are a couple of helpful general tips and strategies that might help you solve it:

Solve a simpler version of the problem
Use one of the standard algorithms for this type of problem

There are times when these tips do not apply, and those cases usually require a lot more thinking and thus are usually more interesting. But let’s try to apply these strategies in the problem at hand.

Solve a simpler version

How can we simplify the problem? Well, of course we can reduce the bounds of |S| and Q to, say, \le 10^3, but in that case the answer is easily brute-forceable, so it doesn’t get us closer to a fast solution.

Well, we can simplify the problem by removing the “reversal” operation for each query, i.e. each query consists of two integers (k,l), and you need to tell whether the substring from the k th position to the l th position is a palindrome. (You can view this simplified problem as a special case of the original problem with i = j = 1, for example) Now this simplification is more interesting

In this case, one can apply standard palindrome algorithms like Manacher’s algorithm to determine the longest palindrome centered at each position. This takes O(|S|) preprocessing and O(1) query time. However, this solution doesn’t adapt well once we add in the reversal operations, so let’s try to find another solution.

Use a standard algorithm

At this point, we try to find what else can be done to check for palindromes. Well, what else are standard algorithms for this task? Hashing, of course! In fact, in many occasions one might prefer a hashing algorithm instead of Manacher’s algorithm because it’s more flexible, and easier to implement (whereas Manacher’s algorithm can be tricky to implement, and is not as flexible).

A string is palindromic if and only if it is equal to its reversal. However, the reversal of any substring of a string U is a specific substring of the reversal of U. In more detail, if we denote the reversal of a string S by S^R, then we’re saying that if U[i..j] is a substring, then U[i..j]^R = U^R[N+1-j..N+1-i], where N is the length of U. Therefore, we have reduced a query into a case of comparing whether substrings of two strings are equal. But substring comparisons can be done quickly using polynomial hashing!

Polynomial Hashing

A hash is simply a function that maps an object of an arbitrary size (such as a string) into an value of fixed size (such as a bounded integer).

There will inevitably be pairs of objects that will be mapped to the same value (in fact, infinitely many of them). We call a situation where two different objects are mapped to the same hash value a collision. Collisions are inevitable, however we desire hash functions such that it is “random” enough so that for any string we will hash, it will be assigned a hash value with equal “probability” among all possible hash values, and that it is not particularly easy to generate different strings with the same hash value.

One application of hash values is in comparing strings: If our hash function is good enough, then two different strings have hashes with high “probability”. Let’s be a bit more specific. Suppose that our hash function maps strings to integers from 0 to M-1. Suppose that we have two different strings (S,T). If we assume that the hash function is “random” enough, then the “probability” that S and T have different hashes is \approx 1 - 1/M, which is pretty close to 1 assuming M is large. If we want to compare K pairs of different strings (S_1,T_1), \ldots (S_K,T_K), then the probability that every pair have different hashes is \approx (1 - 1/M)^K.

In our case, we will need to compare K \approx 10^5 pairs of strings, so if we choose M to be around 10^9, then the probability that no collision will occur is approximately (1 - 1/10^9)^{10^5} \approx .9999. Therefore, our chances of succeeding are pretty high!

One common hash function is the polynomial hash. To compute polynomial hashes, we first fix two integers M and B. Also, for every character c alphabet, we assign a fixed value H[c] (for example, the ASCII value of c). Then the polynomial hash of the string S = S_0S_1\cdots S_{n-1} is the following:

\text{hash}(S) = \left(\sum_{i=0}^{n-1} H[S_i]B^i\right) \bmod M

Clearly the result is a number in [0,M-1), but to minimize potential easy “collisions”, we ensure that M is prime, and that the order of B modulo M is reasonably large.

Why did we choose the polynomial hash? Because it has other advantages that will be useful in our case:

Given any two strings S and T, the hash of the concatenation ST can be quickly computed if we know the hashes of S and T, because of the following identity

\text{hash}(ST) = \left(\text{hash}(S) + \text{hash}(T)B^{|S|}\right) \bmod M

Given any two strings S and T, the hash of T can be quickly computed if we know the hashes of S and ST, because of the following identity (which follows easily from the previous identity)

\text{hash}(T) = \left(\text{hash}(ST) - \text{hash}(S)\right)B^{-|S|} \bmod M

Given any string S, one can easily preprocess it so that the hash of any substring S[i..j] can be computed in O(1) time. To do so, we first precompute the hashes of all prefixes of S. Let’s denote the hash of S[1..j] by h(j). This can be done in O(|S|) time using the first identity:

h(i) = \left(h(i-1) + H[S_i]B^{i-1} \right) \bmod M

Now, with the array h (and precomputed powers of B and B^{-1} modulo M), the hash of the substring S[i..j] can now be computed using the second identity:

\text{hash}(S[i..j]) = \left(h(j) - h(i-1)\right)B^{-(i-1)} \bmod M

This is great, because this gives us an O(1)-time algorithm to answer our query: preprocess the string S and its reversal S^R as above, and simply check if the hashes of S[k..l] and S[k..l]^R = S^R[N+1-l..N+1-k] are equal! The overall algorithm runs in O(|S| + Q).

Solving the problem with reversals

Now, we go back to the original problem, where each query has a reversal operation. Let’s try to answer the query (i,j,k,l), where the substring S[i..j] is reversed temporarily.

Our goal is to be able to compute hashes of substrings assuming that S[i..j] has been reversed. Let’s handle the simple cases first:

If [k,l] doesn’t overlap with [i,j], then we can just ignore the reversal operation, and the hash of S[k..l] is still the same.
Next, if [k,l] is completely contained in [i,j], then a simple adjustment can be done: The hash of S[k..l] in the updated string is just the hash of S^R[N+1-i-j+k..N+1-i-j+l] (why?)

Therefore, we only have two remaining cases to handle:

[i,j] partially overlaps with [k,l]
[i,j] is completely contained in [k,l]

But in these cases, we can split the string S[k..l] into at most three parts, where each part is either completely contained in S[i..j], or completely outside. We have just described above how compute the hashes of each of the parts quickly, and we can compute the hash of the concatenation of two strings quickly (using the identity in the previous part). Therefore, we now have an algorithm to compute the hash of any substring S[k..l] assuming some substring S[i..j] is reversed temporarily! Since we only compute hashes of a fixed, bounded number of parts, this operation takes O(1).

Since we can now compute hashes of substrings even when some substring is temporarily reversed, we can now answer queries (i,j,k,l) in O(1) time!