AMSTRING - Editorial

pragrame · April 22, 2013, 12:03am

Problem Link:

Practice

Contest

Difficulty:

EASY

Pre-requisites:

Sliding window, strings

Problem:

Given two strings A and B of equal length N, find how many substrings of equal length have a difference not greater than K (Substrings are denoted by L1, R1, L2, R2 : A[L1…R1], B[L2…R2], R1-L1 = R2-L2. Here, difference is the number of characters at that differ in the same corresponding positions. Also, substrings are different if their positions in the string are different.

Quick Explanation

For strings S and T of equal length, and characters c and d, diff(S+c, T+d) = diff(S, T) + 1 or 0 (depending on whether c = d or not).
Similarly, diff(S - {its first character}, T - {its first character}) = diff(S, T) - 1 or 0 (depending on whether their first characters are different or not).

Using the above, we can get diff(L1, R1, L2, R2) from diff(L1, R1-1, L2, R2-1) in O(1) time.

Also, if X is a substring of S and Y is a substring of T (at same positions), then diff(X, Y) <= diff(S, T). Thus, for a fixed L1, L2, once we find the minimal length for which diff(A[L1…R1], B[L2…R2]) > K, we know that for all larger lengths and those starting positions, the difference > K.

The above reasoning yields the following algorithm:


Initialize L1, L2 globally.
R1 = L1, R2 = L2, diff = A[L1] != B[L2];	//start with length 1
while(R1 < N && R2 < N)
	if(diff == K+1)
		ans += R1 - L1;	// quadruplets (L1, L1, L2, L2)...(L1, R1-1, L2, R2-1) are valid;
		diff -= (A[L1] != B[L2]);
		L1++, L2++;
	else 
		R1++, R2++;
		if(R1 < N && R2 < N)
			diff += A[R1] != B[R2];
//now we still have (L1, L1, L2, L2) ... (L1, R1-1, L2, R2-1) as well as (L1+1, L1+1, L2+1, L2+1) ... (L1+1, R1-1, L2+1, R2-1) ... etc. to be counted
len = min(N-L1, N-L2);
ans += len * (len+1)/2;		//all substrings of this string get counted.

Finally, while the above procedure takes O(N) time, we need to ask, how many times do we need to Initialize L1, L2 and to what values should they be initialized. Note that once initialized, the offset of the starting positions between the substrings remains constant. Thus, we should initialize L1, L2 with all possible beginning offsets.


ans = 0
for(i = 0; i < N; i++)
	run above procedure with L1=i, L2 = 0
for(i = 1; i < N; i++)
	run above procedure with L1=0, L2 = i
Output ans

The time complexity of the whole procedure is thus O(N^2) per testcase.

Explanation:

Let us begin with as naive an approach as possible and try to improve from there. A brute force solution would vary over all possible L1, L2, length and check what is the value of the difference of strings. Pseudocode would be:


Diff(string S, string T)
val = 0;
for(int i = 0; i < S.length(); i++)
	if(S[i] != T[i])
		val++;
return val;

solve(string S, string T)
ans = 0;
for(int L1 = 0; L1 < N; L1++)
	for(int L2 = 0; L2 < N; L2++)
		for(int R1 = L1, R2 = L2; R1 < N && R2 < N; R1++, R2++)
			if(Diff(S.substring(L1 to R1), T.substring(L2 to R2)) <= K)
				ans++;
Output ans;

The above algorithm would take O(N) time for the Diff function, and would be called O(N^3) times, giving a complexity of O(N^4). This is unfortunately too slow.

We need to be more intelligent in calculating Diff or in calling the function Diff. We notice that A[L…R] is almost the same as A[L+1…R] or A[L…R+1], and similarly for B. Indeed, if we were to compare Diff values

A[L1…R1+1] = A[L1…R1]x

B[L2…R2+1] = B[L2…R2]y

and hence diff(A[L1…R1+1], B[L2…R2+1]) = diff(A[L1…R1], B[L2…R2]) + (x!=y).

Similarly, diff(A[L1+1…R1], B[L2+1…R2]) = diff(A[L1…R1], B[L2…R2]) - (A[L1] != B[L2]).

Well, that takes care of calculating difference efficiently, atleast somewhat. Now can we avoid calculating difference unnecessarily too? Notice from equation:

“diff(A[L1…R1+1], B[L2…R2+1]) = diff(A[L1…R1], B[L2…R2]) + (x!=y)” the fact that

diff(A[L1…R1+1], B[L2…R2+1]) >= diff(A[L1…R1], B[L2…R2]).

Thus, once we find minimal length l such that diff(A[L1…L1+l-1], B[L2…L2+l-1]) > K, then for any length >= l, we will get the difference > K.

Finding what is the maximal R1, R2 such that for fixed L1, L2 we have diff(A[L1…R1], B[L2…R2]) <= K, and then varying over possible L1, L2 yields the algorithm described in Quick Explanation, which is O(N^2).

Setter’s Solution:

Can be found here

Tester’s Solution:

Can be found here

milhaus · April 22, 2013, 12:11am

Could somebody please explain to me why I got NZEC? I would understand TLE, but why NZEC? Thank you CodeChef: Practical coding for everyone

betlista · April 22, 2013, 12:21am

May I ask what that part in Problem paragraph refers to

Also, substrings are different if their positions in the string are different.

because when we have strings A=“aab” and B=“caa”, A[1,2] and B[2,3] are the same…

kierguevara · April 22, 2013, 12:31am

I’m getting RTE on my submission CodeChef: Practical coding for everyone. I don’t know why

prakash1529 · April 22, 2013, 9:04pm

When will this question be moved to Practice section ? I can’t find it under “Easy” or “Medium”.

sourcewizard · April 28, 2013, 3:38am

I have a doubt that there is something fishy in the algorithm mentioned in editorial.
I wrote a code based on above algorithm and it got accepted.
here is the link to my submission : http://ww2.codechef.com/viewsolution/2074155

here is the fastest code in cook off: http://ww2.codechef.com/viewsolution/2067206
both got accepted

my test input :

1
5 2
a b b a b
a a b b a

output for my code is : 49
output for other code is 43

my reasoning in this particular test case.
for L1 = 0 and L2 = 0 , the [ans] value should be 14 whereas according to algorithm mentioned in the editorial it will be 15 which is wrong.
can someone please investigate and clear this doubt.

thanks

sap88coder · October 5, 2013, 10:58am

One Up Vote is not enough for this editorial…very very well written. Thanks a lot

divanshu · April 22, 2013, 12:35am

Submissions were getting NZEC(RTE) instead of TLE due to server problem. An official CodeChef announcement was also made for the same.

inseder · April 22, 2013, 12:35am

@mihaus i had the same problem - mine was getting always NZEC after 1.99 sec no matter what i would have done in the code - actually i think the program was ended after 2 seconds and instead of TLE the NZEC occured based on a “bug”

pragrame · April 22, 2013, 12:58am

It means A[1,1] is different from A[2,2], even though both are substrings and are “a”. We consider substrings of A in terms of positions, not in terms of what the substring is.

betlista · April 22, 2013, 1:00am

I see, thanks for clarifying.

betlista · April 22, 2013, 9:32pm

In paragraph “Problem Link:” there is a link…

prakash1529 · April 22, 2013, 9:42pm

Got it!! Thanks

bugkiller · April 23, 2013, 12:49am

Setter’s Solution > Page Not Found!

betlista · April 28, 2013, 3:42pm

Thanks for pointing that. I wrote brute force algorithm and it seems that result is 51 - gPQr41 - Online Java Compiler & Debugging Tool - Ideone.com

sameer47 · April 28, 2013, 4:06pm

All of the above codes give output as 51 only. The problem is that you are giving input wrong. There should be no space between characters

sourcewizard · April 28, 2013, 6:40pm

yup, spaces was a mistake . thanks for pointing out.