Problem LinkAuthor: Ivan Safonov Tester: Hasan Jaddouh Editorialist: Bhuvnesh Jain DifficultyEASY PrerequisitesPrefix sums, Sliding Window ProblemYou are given a string $S$ of length $N$ and an integer $K$. Consider all the $(N  K + 1)$ substrings of $S$ of length $K$. Find the sum of the hamming distance between 2 consecutive substrings, where hamming distance is defined as the number of positions where the characters of two strings differ. ExplanationFor simplicity, the editorial assumes 0based indexing of the string. Subtask 1: T ≤ 100, N ≤ 50For the first subtask, a naive brute force is enough to solve it. Below is a pseudocode for it.
The complexity of the above approach will be $O(N^2)$ per test case. This is not efficient for the second subtask. Subtask 2: T ≤ 100, N ≤ 100000For this consider, let us consider an example string $S = "aabbcc"$ and $K = 3$, same as in the problem statement. Below table shows the calculation for the above naive approach. The naive approach tries to find the hamming distance of every row with the previous one and add it up. Let us see what happens when we try to find the differences along the column and them sum them up. The below table shows the calculation. What advantage do we have while doing the above calculation in column manner? First, we don't have to deal with full substrings. Instead, we deal with each column independently and all we care about is whether $2$ consecutive characters in $S$ are same or different. The second one is that there is a lot of repetition in the calculation across columns, meaning that same set of different characters are repeated across many columns. For example, in the above case the pair $(a, b)$ corresponding to letters at indices $(1, 2)$ appears in column $1$ and column $2$. So, let us store the difference array, $diff$, where $diff[i]$ will be $1$ if $S[i] != S[i1]$ else $0$. using this, we can clearly see that answer for a column is just a sum of range for $diff$ range. Using prefix sums, we can calculate the answer for every column in $O(1)$. Then, we can add the contribution to all possible columns and find the overall answer. Below is pseudocode for the above approach:
The time complexity of the above approach will be $O(N + K)$. The precomputation will consist of 2 parts. First, finding the $diff$ array and then calculating the prefix sum of $diff$ array. Each step takes $O(N)$ complexity. Then, we find the contribution for every column, which is $K$ in total. The space complexity will be $O(N)$ for this approach. You can improve the space complexity to $O(1)$ as well using slidingwindow to approach to calculate the prefixsums on the fly and hence the answer for every column. For more details, you can refer to the editorialist's solution for help. Bonus ProblemConstruct a test case where the answer will overflow i.e. will not fit inside integer type and we would require the use of "long long" in C++ and "long" in Java. Feel free to share your approach, if it was somewhat different. Time Complexity$O(N + K)$ per test case. Space Complexity$O(N)$ Solution Links
This question is marked "community wiki".
asked 25 Jun '18, 23:44

my approach is kind of common sense approach. first step>>> As first we calculate maximum no. of transaction an element can do that is min( nk , k). max no of transaction an element can do min of: 1: k bcoz if we have a b c d e f then c can go 3 times transaction with d then d after that c will be gone . a b c > b c d> c d e> d e f. 2: nk bcoz if we have four element for ex a b c d , k=3 then a b c> b c d; everyone has gone 1 transaction. second step>> then we calculate array of transaction element will do. 1: first element can do 1 transaction ; second can do either 2 or 1 if max limit is 1; third element can do 3 transaction if max limit is not reached otherwise it will do transaction of max limit ;like this we build the array but 2: last element can do 1 transaction ; second last can do 2 only if max limit is not reached ; third last can do 3 if max limit is not reached else it will do that much transaction. now ex to understand above approach here k =3 ; just for understanding elements; transaction array; max transaction can do; a b c d;;;;;;;;;;;[1 1 1];;;;;;;;;;; 1 a b c d e;;;;;;;;;;;[1 2 2 1];;;;;;;;;;; 2 a b c d e f;;;;;;;;;;;[1 2 3 2 1];;;;;;;;;;;3 a b c d e f g;;;;;;;;;;;[1 2 3 3 2 1];;;;;;;;;;;3 if 8 elements then ;;;;;;[1 2 3 3 3 2 1];;;;;;;;;;;3 "similarly it go with any of k" third step>>>now if s[i]!=s[i+1] then we just add the cost(count+=a[i]) as we know how many times it will go transaction with that element ; if s[i]==s[i+1] then no cost taken. link for my code https://www.codechef.com/viewsolution/19032639 for better understanding please refere to link as code contain comments answered 01 Jul '18, 00:00
2
basic approach is just to cal no of transaction bw i and i+1 that is through transaction array i created and ping me if anyone had problem
(01 Jul '18, 00:05)

Input Test Case for overflow: Output Which will overflow and not fit in integer datatype answered 01 Jul '18, 02:50

@mohitchandrak: Imagine a character $s_i$ in the string that is in cell $c$ (in the board) on the $k$th step. Then the successor of that character $s_{i+1}$ will be on the cell $c$ in the $k+1$th step, if it happens (call this an event). If $s_i \ne s_{i+1}$, then it requires one unit of power. Now the total power that is used over all operations can be thought of as a net effect of all such pairs $(s_i, s_{i+1}), s_i \ne s_{i+1}$. For each such pair, we can figure out on how many cells this event happens (it is $K$, except when the characters are among the first $K$ or last $K$ positions of the string, which is what the linked code considers). answered 30 Jun '18, 23:14

This solution is also in o(1) space complexity and o(n) time complexity. Plzz check it out. check here answered 02 Jul '18, 21:02

I cannot understand the solution, how prefix sum is used to get the answer, can someone please elaborate the intuition behind the author's solution? answered 01 Jul '18, 13:22
The author's solution is same as sliding window. Instead of maintaining the prefix sum, he calculates it on the fly. Can you explain what part is not clear in the prefix sums one? BTW, do you know what is prefix sum and how it is used?
(02 Jul '18, 04:05)
After thinking foe a while, I get it now! @likecs thanks for the reply.
(02 Aug '18, 10:08)

Thanks for telling about the O(1) space complexity approach :) It had never occurred to me. answered 01 Jul '18, 13:25

Thanks for this, never heard about prefixsum before :P, however, I tried solving this question using segment tree, although it ended up giving TLE error on subcase #2, my approach was to build an array 'ARR' of size N1, and each index i of this array is either 1 or 0 ARR[i] = 1 denotes that string's ith and i+1 th character are different and so ARR[0] denotes that i th and i+1 th character are same so if string is : a a b b c c then ARR is : 0 1 0 1 0 then using this array, build a sumquery based segment tree ended up being O((NK)*log(N)) answered 02 Jul '18, 02:05
The solution with segment trees should also fit in time limit, though being slower. I went through your code and there was a problem in the query function which can take $O(N)$ complexity as you are not using the fact that sum for range has been calculated and you end up traversing each node till the end/ I suggest you to go through this easy tutorial first: http://se7so.blogspot.com/2012/12/segmenttreesandlazypropagation.html and then reimplement the solution using segment trees to get better understanding.
(02 Jul '18, 04:10)

anyone can explain the logic behind prefix array & sliding_window used in this question? answered 28 Jul '18, 00:07
