Past kickstart problem

How to solve
this
problem for large data set . I saw the code of other contestants it seems like they are using divide and conq but i am not able to fully understand it .

I think we can do this using divide an conquer and fft(convolution).

Lets say we have arr= 1 4 3 6

Now we divide this into two halves 1 4 and 3 6
To calculate the the sum of range [l,r] where l is in left part and r in right we would need to merge the suffix of left part with prefix of right part.

Now calculate the suffix sum for 1 4 which is 5 4 and prefix sum for 3 6 which is 3 9

So now lets merge so (5+3),(5+9),(4+3),(4+9)

So these sums are possible if l is in left and r in right halve.

This can be done using fft.
Let us express left halve as x^4 + x^5 and right as x^3 +x^9. Do their multiplication,the coefficient represents the occurrence.(just keep a global frequecy map and update accordingly)

Now similarly calculate for left and right part recursively.

Complexity aroung nlog^2n

1 Like

I saw a few submissions at the top of the leaderboard and I don’t think they are using divide and conquer. My solution is quite similar to these and the only things used are prefix sums and binary search.

Let us call the input array A and let S_{i,j} = \sum_{k=i}^j A_k. Now consider all the subarray sums:
S_{1,1}, S_{1,2}, S_{1,3}, \cdots , S_{1,N} \\ S_{2,2}, S_{2,3}, \cdots, S_{2,N} \\ \vdots \\ S_{N-1,N-1}, S_{N-1,N} \\ S_{N,N}

When arranged in this particular order, observe that each row is a sorted sequence (there are no negative numbers in A).

How does this help? If you construct the prefix sum array you can access all these sums in constant time, so you effectively have these N sorted sequences ready for use.
Now given a value x you can binary search on these N sequences to count how many elements in each sequence are \le x. If you add these counts up, you can get the total number of sums S_{i,j} which are \le x.

Next you can binary search on x to find the element at index k in the sorted sequence of all S_{i,j}.
And if you know this element at k, you can also find the sum of all S_{i,j} which are \le this element using a prefix sum array of the initial prefix sum array.

Now the final step is just to compute this sum for k=R and k=L-1 separately and print the difference. The total complexity is \mathcal{O}(Q \cdot N \cdot \log{N} \cdot \log{100N}), where the 100 is the limit on the maximum value of an element of A.

3 Likes

Since the sum of every subarray is just the difference of two prefix sums, we can create two polynomials corresponding to the prefix sums and their negative. Let the prefix sums of the array be s1,s2,…sn.

P1 = x^s1 + x^s2 + … + x^sn

P2 = 1 + x^(-s1) + x^(-s2) + … + x^(-sn) (here,the term 1 is for the empty prefix)

The coefficient of x^i in P1*P2 will be the number of subarrays having sum ‘i’. However, since we can’t deal with negative powers, we will multiply p2 with x^S (S>=sn). Now, the coefficient of x^i will represent the number of subarrays having sum i-S. We will only take the coefficients corresponding to positive values ie. i-S >0 or i>S. We can store these counts in an array and binary search for each query.

The complexity of this solution will be O(AlogA + QLogA) where A = 100*N.

5 Likes

Can you provide a link to any solution that uses divide and conquer? I know a way to solve this but in a different way.

@meoooow If possible can u explain your method?

@meooow u can see the solution in the rank list !

Thanks @vivek_1998299 for the explanation i dont understand the fft part “Do their multiplication,the coefficient represents the occurrence.(just keep a global frequecy map and update accordingly)”…

See the possible sums should be (5+3),(5+9),(4+3),(4+9)

Now (x^5 + x^4)*(x^3 + x^9)= x^(5+3) + x^(5+9) + x^(4+3) + x^(4+9)

As u can see coefficents of multiplication is 1 for all terms,ie (5+3),(5+9),(4+3),(4+9) occured once.
(The coefficient of x^y represents the number of occurence of y)

Now this multiplication can done faster using fft.

@vivek_1998299 but after creating N*(N+1) its sorted will this work for sorted array(N*(N+1)) too?

Yeah ,the whole reason for doing this divide and conquer and fft was to create a frequency map of sums of n*(n+1)/2 subarrays.If u see then only n*100 values can be possible.After constructing frequency map We can then easily answer any query.

I mean lets say frequncy map is:

1 - 2 times
3 - 1 time

I want [1,3] which is 1 * 2 + 3 * 1 ,(i mean O(n * 100))

(Or rather preprocess to answer in O(logn) )

@vivek_1998299 can you explain your approach a bit more? Also if you can share some working code that would be great.

Ohk,so put forward simply,first we calculate sum of all subarrays[l,r] whose l lies in left ,r lies in right part there are about n^2/4 such subarrays.Now the only subaarays that remain are the one whose [l,r] both lies in left part ,or both lies in right part.So we could do this recursively over left and right part.And for a particular level of recursion tree the complexity is O(nlogn) (for fft merging),there are logN levels so total complexity O(nlogn^2)

The recurrence relation is T(n)=2T(n/2)+O(nlogn)

And if you know this element at k, you can also find the sum of all Si,j which are ≤ this element using a prefix sum array of the initial prefix sum array

Can you explain this further?

That is a neat solution, probably the intended one.

Let the prefix sum array of A be P, and the prefix sum array of P be Q. And you want to find sum of all S_{i,j} \le x.
For the i^{th} of the N sequences above, binary search to find the last index j such that S_{i,j} = x. Then \sum_{k=i}^j S_{i,k} = Q_j - Q_{i-1} - P_{i-1} \cdot (j - i + 1). It should make sense somewhat intuitively, but if it doesn’t you can write each term as a summation of elements of P and the math will work out :smiley:

Or maybe not… you can handle much larger number of queries with this approach.

Probably a much better solution than intended :slight_smile: