Questions Tagged With suffix_automatahttps://discuss.codechef.com/tags/suffix_automata/?type=rssquestions tagged <span class="tag">suffix_automata</span>enWed, 06 Aug 2014 14:26:54 +0530LCS via Suffix Automatonhttps://discuss.codechef.com/questions/48683/lcs-via-suffix-automaton<p>Need help in finding the LCS of two strings vis suffix automata. |S|<=25000.
I am able to construct the suffix automata for a pattern P[] but am unable to understand how to calculate the LCS.</p>
<p>How do I use the transition function?</p>
<p>PS:Query is because I was unable to understand the editorial of SSTORY.</p>
<p>Any help is appreciated!
Thanks.</p>ironmandhruvMon, 28 Jul 2014 18:49:41 +0530https://discuss.codechef.com/questions/48683/lcs-via-suffix-automatonsuffix_automatalcsHow can we determine the number of occurrence of a substring using suffix automaton?https://discuss.codechef.com/questions/49095/how-can-we-determine-the-number-of-occurrence-of-a-substring-using-suffix-automaton<p>When I have build a suffix automaton for string T(suppose T="ababcab"), then if I want to know how many times a sub-string occurs in (i.e how many times Pattern P = "ab" occurs in T), how can I do that?
EDIT: I have found an implementation like this:
let cnt[v] is the For all the states v number of occurrence of the substrings associated with v.
Initially if state v was not cloned then cnt[v] = 1 , otherwise it's 0. We go through each state in descending order of length. and for each state we do cnt[link[v]]+=cnt[v]; After that , if u is the state corresponding to pattern P , then cnt[u] is the ans.
But I can't really understand why this works?</p>tamimcsedu19Wed, 06 Aug 2014 14:26:54 +0530https://discuss.codechef.com/questions/49095/how-can-we-determine-the-number-of-occurrence-of-a-substring-using-suffix-automatonsuffix_automatastringSUBQUERY - Editorialhttps://discuss.codechef.com/questions/46075/subquery-editorial<p><strong>Problem link</strong> : <a href="http://www.codechef.com/LTIME13/problems/SUBQUERY">contest</a> <a href="http://www.codechef.com/problems/SUBQUERY">practice</a></p>
<p><strong>Difficulty</strong> : Medium</p>
<p><strong>Pre-requisites</strong> : Suffix Automaton</p>
<p><strong>Problem</strong> : Given a string <b>S</b> and <b>N</b> queries. Each query is defined by two integers - <b>L<sub>i</sub></b> and <b>P<sub>i</sub></b>. Count the number of strings of the length <b>L<sub>i</sub></b> that occur <b>exactly P<sub>i</sub></b> times (as the consecutive substrings) in the string <b>S</b>.</p>
<h2>Explanation</h2>
<p>This problem is the hardest in the set. At least, according to my estimations. There are some simple solutions that can be coded without putting a lot of effort, though, so let's consider them first.</p>
<h3>How to get 13 points</h3>
<p>The constraints are very low, so you can do a pure brute force. For example, you can store all the substrings in the STL map along with the number of their occurences and when you get a query, you can iterate through all the substrings in the map and for the each of them you can check that the length and the number of occurences is the same with what is asked in the corresponding query.</p>
<h3>How to get 40 points</h3>
<p>Now <strong>N</strong> is larger so we have to store the information about all the substrings in the more efficient way. For example, using a <a href="http://en.wikipedia.org/wiki/Trie">trie</a>. It is possible to construct the trie, containing all the suffixes (=all the substrings) of the string <strong>S</strong> in <strong>O(|S|<sup>2</sup>)</strong> time. Moreover it's possible to store the depth and the number of occurences of the associated substring for each trie's node. Knowing that there will be no more than <strong>O(|S|<sup>2</sup>)</strong> such pairs, we can just count the amount of occurences of each pair and then answer queries in <strong>O(1)</strong> time.</p>
<h3>How to get 100 points</h3>
<p>Here we need some suffix structure in order to store the information about the substrings. Writer's choice is the suffix automaton. The complete its' description is beyond this editorial, because it requires a lot of space and theoretical results. So, from now on, we will consider that you're familiar with the suffix automaton. If you're not, here are some basic facts that will be crucial in out solution:</p>
<ul>
<li>The suffix automaton is a finite automaton that has letters associated with its' edges. </li>
<li>Any path from the root's node corresponds to some substring of the string <strong>|S|</strong> (of course, assuming that we've built the suffix automaton for the string <strong>S</strong>). And vice versa, any substring of <strong>S</strong> can be obtained by going along some path from the root node of the suffix automaton. So we can form the following criteria: the string <strong>T</strong> is a substring of <strong>S</strong> <=> there's a path from the root node of the suffix automaton for the string <strong>S</strong> that has the string <strong>T</strong> associated with it.</li>
<li>The suffix automaton has <strong>O(|S|)</strong> nodes. In the general case, no more than <strong>2|S|</strong> nodes.</li>
<li>Each node has a set of substrings associated with it.</li>
<li>All the substrings associated with a single node occur the same number of times.</li>
<li>All the substrings associated with a single node are the suffixes of the largest substring, associated with this node.</li>
</ul>
<p>Knowing this, we obtain the following solution:</p>
<ul>
<li>At first, we build the suffix automaton for the given string</li>
<li>Then, for each its' node we calculate the number of occurences of its' associated substrings, as well as the lowest and the highest length of the string, associated with this node.</li>
<li>Now we obtain a set of segments parallel to Y-axis, where X-axis corresponds to the number of occurences and the Y-axis corresponds to the length. There are as much these segments as the nodes in the suffix automaton.</li>
<li>And now the problem is: given <strong>O(N)</strong> segments, parallel to Y-axis and queries of the form (<strong>X</strong>, <strong>Y</strong>). The answer for a query is the number of the segments that cover the point with the coordinates (<strong>X</strong>, <strong>Y</strong>).</li>
</ul>
<p>This new problem can be solved different. One of the ways is to note that different <strong>X</strong>-coordinates' segments are independent. So we can solve the 1D case of this problem. While solving the 1D case, it's easy to see that the number of the segments that cover the point <strong>X</strong> equals to <strong>A-B</strong>, where <strong>A</strong> equals to the number of the segments that end later or equal than in <strong>X</strong> and <strong>B</strong> equals to the number of the segments that begin after <strong>X</strong>.</p>
<h3>Related links</h3>
<ul>
<li>A <a href="http://www.cs.nyu.edu/~mohri/pub/nfac.pdf">paper</a> on suffix automaton</li>
<li>The <a href="http://www.codechef.com/APRIL12/problems/TSUBSTR">TSUBSTR</a> problem from CodeChef April 2012 Long Challenge, that also requires you to use the suffix automaton, and its' <a href="http://discuss.codechef.com/questions/3697/tsubstr-editorial">editorial</a></li>
</ul>
<p><strong>Solutions</strong> : <a href="http://www.codechef.com/download/Solutions/LTIME13/Setter/SUBQUERY.cpp">setter</a> <a href="http://www.codechef.com/download/Solutions/LTIME13/Tester/SUBQUERY.cpp">tester</a></p>xcwgf666Sun, 29 Jun 2014 14:01:51 +0530https://discuss.codechef.com/questions/46075/subquery-editorialltime13triemediumsuffix_automata