PROBLEM LINK:Author: Sergey Kulik DIFFICULTY:Medium PREREQUISITES:Segment tree PROBLEM:Given a string $s$ of length $N \le 10^5$ generated uniformly randomly and independently, support $Q \le 10^5$ operations of two kinds:
QUICK EXPLANATION:Let $C$ be the length of the longest repeated substring at any point during the operations. Then with very very high probability, $C$ is very small, say, $C \le 24$. So we assume we can compare two suffixes by only comparing their first $C$ characters. Similarly, we can compare cyclic shifts by comparing the first $C$ characters, assuming the cutoff point of the cyclic shift isn't within the last $C$ positions. Finally, there are only $C$ cyclic shifts where the cutoff point is within the last $C$ positions. This means that we can set up a segment tree to compute range minimum queries. Specifically, to find the index of the minimal cyclic shift in the range $[i,j]$, find the minimum suffix in the range $[i,jC]$, and then manually compare this minimum with the last $C$ cyclic shifts. This runs in $O(C^2 + C \log N)$. Finally, during updates where we replace the $j$th character, we simply update the suffixes in the index range $[jC,j]$ in the segment tree. This runs in $O(C^2 \log N)$. EXPLANATION:RandomnessThe fact that the string and the updates are generated uniformly randomly and independently is important: It means we can expect not to see any "nonrandom" structures in the input, like long runs of a single letter Given this, let's see how a naïve algorithm will fare. Note that in the naïve algorithm, modifications are trivial. The only slow part is finding the minimal cyclic shift, which takes $O(N^2)$ in the worst case. In this pseudocode, we let $s[i]$ be the $i$th character of $s$. Note that in the substring $s[i\ldots j]$, the $p$th character in the cyclic shift that starts at $k$ ($i \le k \le j$) is $s[i + (k  i + p  1) \bmod (j  i + 1)]$. Also, be careful: this code uses oneindexing.
Since $s$ is generated uniformly randomly and independently, it means the comparison Thus we can expect the loop in
This is of course incorrect in some cases, but there is a very, very low probability that it will fail, so we're still good! But notice that if we're only comparing the first $24$ characters, then for most pairs of cyclic shifts in $[i,j]$, their comparison is actually the same as the comparison between the suffixes that start in the same location! Specifically, for cyclic shifts that start in the indices $[i,j24]$, we can expect that comparing them will finish before the indices start to wrap around the substring $s[i\ldots j]$. To explain further, we also modify the remaining parts of the pseudocode above:
Notice that in the range $[i,j24]$, we're using the Faster range minimumUnfortunately, the code above is still very slow, because the range $[i,j]$ can be large and there are a lot of queries. Specifically, it runs in $O(QNC)$ time in the worst case, where $C$ is the constant we chose. ($24$ in the code above.) To improve it, notice that the first loop in In this case, we can use segment trees because there are also updates. Specifically, we set up a segment tree over the suffixes of the string, but we only store the indices, and for comparison we use Let's analyze the running time. Segment tree point updates and range queries need $O(\log N)$ comparisons, but each operation takes $O(C)$ time, so a segment tree operation runs in $O(C \log N)$ time. The worst queries are updates, where we perform $C$ segment tree updates, so it takes $O(C^2 \log N)$ time. This means that overall, the running time is $O(QC^2 \log N)$! There's also an $O(NC)$time construction of the segment tree. The overall running time is $O(NC + QC^2 \log N)$ which is fast enough, but if $O(QC^2 \log N)$ looks a bit slow and bothers you, then you can consider replacing the $C$ updates with a single range update, but this time it's okay to do it without lazy propagation. At the very least it will save you some number of comparisons, particularly near the root. Time Complexity:$O(C (N + Q C \log N))$ where $C$ is an upper bound to the longest length of any repeated substring AUTHOR'S AND TESTER'S SOLUTIONS:
This question is marked "community wiki".
asked 13 Jun '16, 23:28

One thing I couldn't understand : The indices in the update query are not chosen randomly (only the character is selected randomly). So it is very well possible to transform the string such that the number of comparisons exceed the assumption of (24/30/40) chars in s_cycle_compare(). Please correct me if I am wrong. Thanks! answered 23 Jun '16, 16:25

we can also find the lexicographically smallest character within the range [l,r] ,and then concatenate the left part at the end to find the minimal cyclic shift. Then, we will have to deal with characters instead. will this result into a W.A.? answered 26 Jan '17, 10:40
