You are not logged in. Please login at www.codechef.com to post your questions!

×

TASTR - Editorial

23
14

PROBLEM LINKS

Practice
Contest

DIFFICULTY

MEDIUM

PREREQUISITES

Suffix Array, Longest Common Prefix Array

PROBLEM

Given two strings a and b, let A be the set of all substrings of a, and B be the set of all substrings of b.

Find the number of unique strings in A plus the number of unique strings in B, that are not common to both A and B.

QUICK EXPLANATION

Efficient solutions to a large set of problems on strings are approachable by use of Suffix Arrays (or Trees). If you have not yet added Suffix Sorting (for construction of Suffix Arrays) to your skill-set, then this is the perfect problem to do so.

Suffix Sorting can be done using

  • Manber Myers algorithm in O(L log L) time
  • Karkkainan Sander's algorithm in O(L) time

Longest Common Prefixes of adjacent items in the list of sorted suffixes is a classical problem in strings. Most texts discuss construction of Suffix Arrays immediately followed by calculation of LCP arrays.

The LCP Array can be found by augmenting the Suffix Sorting algorithms above. The time complexity for LCP Array calculation will be exactly equal to the time complexity of the Suffix Sorting. You can find it as an entirely different step as well, following the Suffix Sorting.

Both O(L) and O(L log L) approaches will fit within the time limit for this problem. Hence, choose as you please :)

Let us assume that we can find the number of unique sub-strings of a string by use of the LCP array for the suffixes of that string. (We will see the algorithm to do so in the EXPLANATION section)

U(A) = number of unique strings in A

We are given two strings a and b and their set of substrings A and B respectively. We wish to find

U(A) + U(B) - U(A intersection B)

Which is equal to

U(A) + U(B) - (U(A union B) - U(A) - U(B))

Or

2*U(A) + 2*U(B) - U(A union B)

EXPLANATION

Let us see how to find the number of unique substrings of a string by using the LCP array.

  • If we consider all the prefixes of all the suffixes, we would be considering the entire set of sub-strings.
  • The LCA array helps us determine how many prefixes to ignore for each suffix
  • We of course do not ignore any prefix of the first suffix. These are all valid and unique substrings.
Let s be given string
    indexes in s are 1-based
Let S be the suffix array
    S stores the list of 1-based indexes
    that represent the start position of the suffix of s
Let L be the LCP array
    L(i) is the longest common prefix between
    the suffixes starting from S(i) and S(i-1)
    Thus, L is only defined from 2 to |s|

uniq_sub_strings = |s| - S[1] + 1
// thus we count all prefixes of the first suffix

for i = 2 to N
    uniq_sub_strings += |s| - S[i] + 1 - L[i]

Let us try this with an example to see why this is correct.

s = abaabba
S = [
    7,    // a
    3,    // aabba
    1,    // abaabba
    4,    // abba
    6,    // ba
    2,    // baabba
    5     // bba
]
L = [
    0,    // not defined for L[1]
    1,    // a is the common prefix
    1,    // a
    2,    // ab
    0,    // nothing
    2,    // ba
    1     // b
]

uniq_sub_strings =
    1 + 4 + 6 + 2 + 2 + 4 + 2 = 21

Thus, there are 21 substrings (out of 28) that are unique. You can work this out by deducing that the sub-strings that are not unique (and hence weren't counted are)

{
    a,  // prefix of aabba
        // since a was counted as prefix of "a"
    a,  // prefix of abaabba
    a,  // prefix of abba
    ab, // prefix of abba
        // since ab was counted as prefix of "abaabba"
    b,  // prefix of baabba
        // since b was counted as prefix of "ba"
    ba, // prefix of baabba
        // since ba was counted as prefix of "ba"
    b   // prefix of bba
}

Now, you can find U(A) and U(B) by using the above algorithm. The only part that remains is to calculate U(A union B).

This can be done by considering a string

c = a + "$" + b

We can find all the unique substrings of c and reduce from the result, all the strings that contain the character '$'. We know that all the different strings that contain '$' are unique anyway, since '$' is not part of the alphabet.

There are (|a|+1) * (|b|+1) substrings of c that contain '$'.

Thus, we can find U(A union B), the number of unique substrings that exists in either A, or B.

We can find the number of unique sub-strings of either a, or b, but not both, in time O(F(|a|) + F(|b|) + F(|a+b|), where F(n) is the complexity of calculating the Suffix Array or the LCA Array (which ever is more). In the best implementation, F(n) = O(n). But the problem may be solved even with F(n) = O(n log n).

SETTER'S SOLUTION

Can be found here.

TESTER'S SOLUTION

Can be found here.

This question is marked "community wiki".

asked 25 Mar '13, 00:16

gamabunta's gravatar image

1★gamabunta
2.4k128183169
accept rate: 14%

edited 25 Mar '13, 00:16

1

This problem really showed our "level" :)

(25 Mar '13, 00:25) abhinav15922★

I did the same thing as the tester . My solutions (during the contest[http://www.codechef.com/viewsolution/1964089]) and after the contest with some optimizations (http://www.codechef.com/viewsolution/1964221) both TLE .. Is this too strict for Java ? EDIT: I think he has a better Suffix array function.

(25 Mar '13, 00:35) phantom115★

My first solution constructed the suffix array (in O(Nlog(N)) time) for each string independently (A, B and the union of A and B) and I got Time limit exceeded. I had to change my algorithm so that it computed everything only from the suffix array (and the LCPs) of the union of A and B. I thought that was intended, but now I see that I probably have some inefficiencies in my suffix array routines, which I will have to identify.

link

answered 25 Mar '13, 00:44

mugurelionut's gravatar image

7★mugurelionut
10.0k266990
accept rate: 18%

My O(n(logn)(logn)) passes !

(25 Mar '13, 00:48) javadecoder4★

my nlogn passed

(27 Mar '13, 01:25) ab12342★

does the solution that uses hashes exist ?

link

answered 25 Mar '13, 00:51

k0stia's gravatar image

4★k0stia
416
accept rate: 0%

Yes @acmonster construct LCP array using hashes in O(N log^2 N) time. We can compare two suffixes in O(log N) time using hashes. So we can sort them in O(N log^2 N) time. And then calculate LCP array in O(N log N) time.

(25 Mar '13, 01:10) anton_lunyov ♦6★

Where can i read more about suffix array and LCP array? (algorithm and application)

link

answered 25 Mar '13, 15:38

prakhs123's gravatar image

4★prakhs123
33641019
accept rate: 3%

1

A very good pdf on suffix arrays for contest can be found here (http://www.stanford.edu/class/cs97si/suffix-array.pdf)

Another source is http://e-maxx.ru/algo/suffix_array but make sure you open it in Chrome.

(25 Mar '13, 18:08) phantom115★

stanford link is not working!!!

(25 Mar '13, 21:19) prakhs1234★
1

In the stanford link ')' is part of the hyperlink pasted here . Remove that and paste the link in your browser and you will able to access the pdf . Or use this link : http://www.stanford.edu/class/cs97si/suffix-array.pdf

(25 Mar '13, 21:21) vineetpaliwal6★

Amazing tutorial...gr8t work!!!!!

link

answered 26 Mar '13, 13:39

blacklisted's gravatar image

3★blacklisted
2954919
accept rate: 0%

http://www.codechef.com/viewsolution/1966634

My solution works on all test cases i found

but still gives runtime error

please help

link

answered 26 Mar '13, 18:29

ajay09's gravatar image

2★ajay09
1
accept rate: 0%

@anton_lunyov sir may u plzz provide me the test cases where this solution is failed http://www.codechef.com/viewsolution/1967060 thanx in advance..:)

link

answered 27 Mar '13, 00:19

ab1234's gravatar image

2★ab1234
1404514
accept rate: 10%

3

Try large test where both strings have equal characters.
The first test in the system has the form
string(99990,'x')
string(99969,'x')
and the answer obviously is 21,
while your answer is 25769803797.

(27 Mar '13, 00:46) anton_lunyov ♦6★

thank you sir..:) accepted..

(27 Mar '13, 00:58) ab12342★

I can't seem to visualise this part

U(A) + U(B) - U(A intersection B) Which is equal to U(A) + U(B) - (U(A union B) - U(A) - U(B))

Isn't U(A intersection B) supposed to be = to U(A) + U(B) - U(A union B)

link

answered 27 Mar '13, 19:03

ishanbhatnagar's gravatar image

5★ishanbhatnagar
8502614
accept rate: 21%

edited 28 Mar '13, 10:56

Might be that we need to find U(A) + U(B) - 2*U(A intersection B) because common substrings appear twice.

(07 Apr '13, 23:58) ujjaval_k_s2★
1

If U(a) is taken to represent the set of all unique substrings of a, and |U(a)| its size, then the final expression should be: 2*|U(a) UNION U(b)| - |U(a)| - |U(b)|

(31 Jul '13, 22:44) shubh094★

Can someone help me with the stdin and stdout for node.js. My code is as follows

process.stdin.resume();

process.stdin.setEncoding('utf8');

process.stdin.on('data', function (chunk) { var lines = chunk.toString().split('n');

var stringDiff = function(a,b){
    var aSet = getAllSubstring(a);
    var bSet = getAllSubstring(b);
    var nSet = difference(aSet, bSet);
    nSet = nSet.concat(difference(bSet, aSet));

    return nSet.length;
}

var difference = function(a,b){
    var nSet = [];
    for(var i in a){
        var found = false;
        for(x in b){
            if(a[i] === b[x]){
                found = true;
                break;
            }
        }

        if(!found){ nSet.push(a[i])};
    }
    return nSet;
}

var getAllSubstring = function(s){
    var aSet = [];
    var i = 1;
    var j = 0;
    while(i<=s.length&&j<s.length){
        var next = s.substring(j,i);
        if(aSet.indexOf(next)<0){
            aSet.push(next);
        }

        if(i===s.length){
            j++;
            i = j+1;
        }else{
            i++;
        }
    }
    return aSet;
}

var diff = stringDiff(lines[0], lines[1]);
process.stdout.write(diff+'\n');
process.exit();

});

link

answered 27 Mar '13, 23:44

mzazzali's gravatar image

0★mzazzali
1
accept rate: 0%

@anton sir would you please tell me where is my code failing here's my code :(

link

answered 28 Mar '13, 23:38

dawnavd's gravatar image

4★dawnavd
2614510
accept rate: 0%

oops there was a silly mistake, got AC :) learnt a lot from this question

(28 Mar '13, 23:45) dawnavd4★

@anton_lunyov : sir please help me every time i get a run time error(SIGSEGV)...

http://www.codechef.com/viewsolution/1966634

link

answered 29 Mar '13, 07:56

ajay09's gravatar image

2★ajay09
1
accept rate: 0%

hello, could you just tell me weather my code's output is correct or not, irrespective of time exceed error..

link

answered 29 Mar '13, 12:54

kushubham9's gravatar image

0★kushubham9
1111
accept rate: 0%

why doesnt work?

module Main where

import Data.List
import System.Exit

al1 xs = nub . concat . map (drop 1 . inits) . tails $ xs

cmp xs xz = [ x | x <- ((al1 xs)++(al1 xz)), notElem x (intersect (al1 xs) (al1 xz) )]

main = do
    line <- getLine
    line2 <- getLine
    putStr $ show $ length $ cmp line line2
    exitWith ExitSuccess
link

answered 31 Mar '13, 08:24

ivanturgenev's gravatar image

0★ivanturgenev
11
accept rate: 0%

edited 31 Mar '13, 08:31

works fine here runs but (time exceed) import List Import System

http://ideone.com/rTM9av

(31 Mar '13, 08:30) ivanturgenev0★

This is my code in python. it works fine on my system but dont know why it is not working in codechef

http://ww2.codechef.com/viewsolution/2075046

link

answered 29 Apr '13, 16:27

rahulmanthena's gravatar image

0★rahulmanthena
1
accept rate: 0%

Probably you misunderstand the idea of contest programming - your code have to return correct answer for each available input, not only the one in problem statement and your program return 8 for each input, simply because you are NOT reading input from stdin...

(29 Apr '13, 19:16) betlista ♦♦3★

getting run time error NZEC. first submission in codechef. so dont know much abt online compilers. plz help me out sir http://www.codechef.com/viewsolution/5666193

link

answered 29 Dec '14, 14:01

devkothari's gravatar image

2★devkothari
1
accept rate: 0%

First try to comment System.exit(0); - killing JVM is not a good idea in general...

(29 Dec '14, 15:14) betlista ♦♦3★

I'd say, that problem is with memory, you want to allocate array of strings of length n^2 (and n is 100000).

(29 Dec '14, 15:22) betlista ♦♦3★

array of strings with no of strings is n2, where n2 being 10 for a four letter word 10=4+3+2+1 so n2=no of letters +(no of letters-1)+.....+ 1

(01 Jan '15, 13:57) devkothari2★

and sir system.exit(0) presence or absence has no effect in this case in the runtime error.. any other suggestions sir ? thanks

(01 Jan '15, 13:58) devkothari2★

Wow.. that was a quite a sleek approach... Nice one

link

answered 16 Jan '15, 20:23

usaxena95's gravatar image

7★usaxena95
2311312
accept rate: 0%

toggle preview
Preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported
  • mathemetical formulas in Latex between $ symbol

Question tags:

×15,643
×1,672
×643
×90
×36
×12

question asked: 25 Mar '13, 00:16

question was seen: 12,525 times

last updated: 16 Jan '15, 20:23