ABABAABA - Editorial

pushkarmishra · March 14, 2016, 3:12pm

PROBLEM LINK:

Practice
Contest

Author: Sergey Kulik
Tester: Vlad Mosko
Editorialist: Pushkar Mishra

DIFFICULTY:

Easy-Medium

PREREQUISITES:

None

PROBLEM:

Given a string S formed of As and Bs, we need to output a string T that is a subsequence of S and appears only twice in it.

EXPLANATION:

Subtask 1
For this subtask, the length n of string S is less than or equal to 3. That’s a fairly small number. We can easily consider all the cases:

If the string S is of length 1 then the answer is -1.
If the string S is of length 2 then the answer is -1 if the two letters in it are different. If they are the same then the letter is the answer. For example, if S = AA then answer is A.
If the string S is of length 3 then the answer is -1 if all letters are the same. If not then the letter which appears twice is the answer.

Clearly, the complexity is \mathcal{O}(1) per test case.

Subtask 2
For this subtask, the length n of string S can be up to 10. The previous algorithm won’t work anymore. We can use bitmasking though. Let us treat As as 0s and Bs as 1s. We modify the string S to S' which contains 0s and 1s instead of As of Bs. Now, we iterate over each subsequence of S'. There can only be at max 2^{10} subsequences since n can be at max 10. Each subsequence represents a binary number. We convert it to decimal. This decimal must be between 0 and 1023 inclusive. We can keep an array counter[0..1023] to count the number of occurrences of each subsequence. So when a subsequence that is binary representation of number i appears, we add 1 to counter[i]. If there is a subsequence with exactly 2 occurrences, i.e., if counter[i] = 2, we output its corresponding string containing As and Bs.

The complexity of this algorithm is \mathcal{O}(2^n) per test case where n is the length of the string S.

Subtask 3
For this subtask, the length of string S can be as large as 5000. Clearly, the previous two approaches won’t work. We need to make some crucial insights into which subsequences can be possible candidates for the answer.

Let us assume that for a given string S, a string T only appears as a subsequence only twice; once at S \{ i_1, i_2, \dots, i_{|T|} \} and other at S \{ j_1, j_2, \dots, j_{|T|} \} such that i_1 < j_1. But if both of these subsequences form T then so does S \{ i_1, j_2, \dots, j_{|T|} \}. This is a contradiction. In other words, a string T can’t appear ONLY twice in this format wherein the two appearence differ by some position.

That eliminates many cases. The only ones left are these:

When there are only 2 occurrences of a letter in the given string S. For example, S = ABAABAAAA. In this case, the answer is simply that particular letter, i.e., B in this case.
When there exists a double occurrence of a letter which is surrounded by the other letter. For example, S = AABBAABABBA. In this case, BB appears at the index 2 (or 8; we can consider any) and has As before and after it. In this case, the answer is the entire string excluding one out of the two occurrences of B, i.e., AABAABABBA. This can’t work for triplets or more because {N \choose k} \neq 2 for N \geq 3.

So, all we need to do is iterate over the string and either determine if a letter only appears twice or look for a doublet surrounded by the other letter, i.e., the one not forming the doublet. This can be done in one scan of the string S. Thus, the complexity is linear in the size of the input string.

Editorialist’s program follows the editorial. Please see for implementation details.

OPTIMAL COMPLEXITY:

\mathcal{O}(N) per test case where N is the size of input string S.

SAMPLE SOLUTIONS:

Author
Tester
Editorialist

honour_00 · March 27, 2016, 2:01pm

I solved it in the practice mood. And, The test cases are weak .For AAAAAAAAAAABAAAAAAAAAAABAAAAAAAAAAA the answer should be B. But, my previously accepted code printed -1.

p.abhinav · March 27, 2016, 4:59pm

"This can’t work for triplets or more because C(N,k)≠2 for N≥3"

Can anyone explain me why is it so?

shuprog1 · March 27, 2016, 5:29pm

@p.abhinav : C(n,k) is the number of subsequences of length K in a string of length N. Since we need the longest subsequence that apperars only twice, we need C(n,k) to be equal to 2 which is not possible for n >= 3. That is why we consider only double occurrences of a letter surrounded by other letters.

swetanjal · March 28, 2016, 12:04am

I was going through some of the contest solutions. Looks like some of the solutions followed a different strategy rather than the one discussed in the editorial. Would be great if anyone shares any other strategy.
And thanks a lot for such a wonderfully explained editorial.

snk967 · March 28, 2016, 1:20am

I don’t understand why so much emphasis was given(in the problem statement) on the fact that the strings were generated randomly. I mean why does it matter how the test cases were generated? They appear to us random any way! Does randomness hold any significance here?

gandu007 · March 28, 2016, 11:19pm

What about case- ABABABABABABABABABABABABABABABABAB …! Can someone give proof that in this case there doesn’t exist subsequence of S and appears only twice in it ?

an0nh4x0r · April 6, 2016, 11:18pm

Nice Editorial

shuklacoder55 · April 15, 2016, 1:35pm

My answers are correct for all test cases … Still I am getting 10 points … My Submission ID is 9926364 … Can anyone help me ? Thanks in advance

whiteman · June 26, 2016, 10:36am

ABAAAAABB : for this case will the answer be ABAAAAAB ? if Yes then what case does it falls under ?

snk967 · March 26, 2016, 10:46pm

This was a good and tricky problem!

glow · March 27, 2016, 1:18am

in editorialist’s code when we check for count_a==2 or count_b==2 we should also check that both counts are not 2 simultaneously ie. cases like -> AABB

snk967 · March 27, 2016, 2:10am

@glow: Is that really a problem? The solution will still give a correct answer(i.e. “A” in this case).

glow · March 27, 2016, 2:21am

my bad i thought we have to print longest of such string.

snk967 · March 27, 2016, 3:36pm

Test cases were weak for sure, my solution did not deserve 36 points ;).

sharad07 · March 27, 2016, 5:33pm

@p.abhinav Eg: aba|bb|aba …notice there’s a doublet in the middle (N=2).

So first sub sequence would be aba|b|aba (using first ‘b’).
and second would be aba|b|aba (using second ‘b’).

However,if our string was aba|bbb|aba…notice the triplet in the middle (N=3),
number of ways to choose k length sub sequence will never be = 2 as C(3,k) = 1 or 3 always for 1<=k<=3; So we’ll get 3 sub sequences of similar kind.

Similarly for N=4,5… which has been generalized in the above eqn.

mbrc · March 27, 2016, 6:39pm

The string given is random.

pushkarmishra · March 29, 2016, 12:43am

there does exist.

gandu007 · March 29, 2016, 8:03am

i have corrected string , i have mistakenly put two cosecutive “BB” …! my main point is what if string appear like this REGEX=(“AB”)*

geek_geek · March 29, 2016, 11:59am

unless u have a substring like “ABBA” or “BAAB” in the string
or it starts with “BBA” or “AAB”
or ends with “ABB” or “BAA”
the answer is -1