KSUM - Editorial

ma5termind · April 19, 2016, 11:22pm

PROBLEM LINK:

Practice
Contest

Author: Sunny Aggarwal
Tester: Sergey Kulik
Editorialist: Sunny Aggarwal

DIFFICULTY:

Easy

PREREQUISITES:

Sorting, Greedy, Data Structure, Implementation.

PROBLEM STATEMENT:

Given an array A containing N positive integers and an integer K. We are asked to report the largest K values from the list of sums of all possible subarrays of array A.

EXPLANATION:

Subtask 1

Listing sums of all the possible subarrays of A and finding the largest K values will be enough to pass this subtask.

C++ Code

#define ll long long
void solve(int N, int K, int *arr) {
	vector sum;
	for(int i=1;i<=n;i++) {
		long long int s = 0;
		for(int j=i;j<=n;j++) {
			s += arr[j];
			sum.push_back(s);
		}
	}
	sort(sum.rbegin(), sum.rend());
	for(int i=0; i<=K-1; i++) cout << sum[i] << " ";
	cout << "\n";
}

Time Complexity

O((\frac{N \times (N+1)}{2})) \times \log{\frac{N \times (N+1)}{2}})

Complete Code Link

Note

Although the above solution will get passed on first subtask but we can have slight better complexity by maintaining a priority queue for the first K elements instead of sorting all the sums.

Complete Code Link

Subtask 2

It is easy to see that the above solution will time out for this subtask.

Then, how to approach it ?

It can be noticed that the largest and first value will always be sum of all the elements as it is given that all elements are positive integers. It means the sum is corresponding to the subarray [1 to N] inclusively. Now, we have taken up the range 1...N and we can see that the next possible largest sum will be the maximum of sum of range 2...N and range 1...N-1. Let us assume that the second largest is obtained from the range 1...N-1. Then, the third largest sum will be maximum of sum of range 2...N ( previous range left ), range 2...N-1 and range 1...N-2 ( new ranges ). The above procedure can be run K times to find the largest K values.

How to implement above idea ?

Let us maintain a priority queue S ( set can also be used in C++ ). So, whenever we are taking the sum of a range say [L to R] from S, we can simply insert 2 new values to S i.e sum of range [L+1 to R] and [L to R-1] if L != R. Note that along with sum of range we are also maintaining the indices i.e L and R denoting that range in our priority queue.

C++ Code

#define ll long long
void solve(int N, int K, int *arr) {
	set<pair<ll,pair> S;
	long long int prefix_sum[N+1];
	for(int i=1;i<=n;i++) {
		prefix_sum[i] = prefix_sum[i-1] + arr[i];
	}

	S.insert({prefix_sum[N], {1, N}});
	while( K -- && !S.empty() ) {
		pair<ll,pair> top = *S.begin();
		S.erase( top );
		long long int sum;
		int L, R;
		sum = top.first;
		L = top.second.first;
		R = top.second.second;
		cout << sum <<" ";
		if( L != R ) {
			S.insert({sum-arr[L], {L+1, R}});
			S.insert({sum-arr[R], {L, R-1}});
		}
	}	
}

TIME COMPLEXITY:

O(K \times \log{K})

AUTHOR’S AND TESTER’S SOLUTION:

Author’s solution can be found here.
Tester’s solution can be found here.

geek_geek · May 1, 2016, 12:20am

Complexity is

O(K x log K) or O(K x log n) ??

Cause the size of the set is maximum n

anupam_datta · May 1, 2016, 2:09am

how did this pass?

https://www.codechef.com/viewsolution/9987544

c1_6 · May 1, 2016, 6:33am

My O(n^2) solution passed on this - CodeChef: Practical coding for everyone

Edit: It’s actualy O(n^2 * log(n))

abhi123 · May 1, 2016, 2:03pm

why did this gave runtime(sigsegv) all the the time?
#include<stdio.h>
int main()
{
int n,i,j,s=0,t,a[1000],b[1000],k=1,m;
scanf("%d %d",&n,&m);
for(i=1;i<=n;i++)
scanf("%d",&a[i]);
i=1;
j=1;
while(i<=n)
{
while(j<=n)
{
s=s+a[j];
// printf("%d ",s);
b[k]=s;
k++;
j++;
}
s=0;
i++;
j=i;
}
k=n*(n+1)/2;
for(i=1;i<=k;i++)
{

	for(j=1;j<=k;j++)
	{
		if(b[j]<=b[j+1])
		{
			t=b[j];
			b[j]=b[j+1];
			b[j+1]=t;
		}
	}
}
for(i=1;i<=m;i++)
printf("%d ",b[i]);
return 0;

}

abhi123 · May 1, 2016, 2:07pm

why did this gave runtime(sigsegv) all the the time?
plz help somebody??
https://www.codechef.com/viewsolution/9986008

shuboy2014 · May 1, 2016, 3:09pm

In editorial there is mistake , S.rbegin() instead of S.begin() .

shuboy2014 · May 1, 2016, 3:19pm

for those who are getting runtime(sigsegv) , make sure your set S and vector array declaration in global space .

samarjeet27 · May 1, 2016, 4:07pm

@shuboy2014: By using S.rbegin() the array is sorted in descending order so as to access the first K elements from the front.

meight_888 · May 1, 2016, 4:45pm

confused to see the code

geniusharshit · May 12, 2016, 8:18pm

somebody please explain the code , unable to understand some part of code.

geniusharshit · May 12, 2016, 8:19pm

somebody please explain the code , unable to understand some part of code.

projjal16 · May 22, 2016, 11:03am

not able to figure out why using priority_queue is giving wrong answer?

mjguru · June 17, 2016, 6:03pm

Guys, writing editorials is hard. It is not fair to blame the writer for that. Here is my try to explain the solution:

First, lets be clear about what we are looking for: The K largest sums of contiguous segments (fancy way of saying continuous segments) in the array. Note, that all the numbers in the array are positive.

So, if K is 1, we will simply choose the sum of all elements as the answer.

Now lets try K = 2. We will first choose the sum of all the numbers as the answer. But now comes the tricky part: which subsegment should we choose among the thousand available? Simply, the one with the largest sum. Note, (try an example by hand) that the choice for the largest one will be among the sum of elements at indices [0, N-2] or the sum of elements between indices [1, N-1]. (we can’t choose [0, N-1] as that would be the sum of all elements, which we have already taken up).

With the last case, we have observed that choosing as large a range is good as all the numbers in the input are positive.

This leads us to the following theorem:
If we are choosing the sum of segment [L, R] and if the segment [L-1, R] and/or the segment [L, R+1] is not yet taken up, then our choice of segment [L, R] is not optimal.

This leads us to the following algorithm:

let Best be a set of currently considered subsegments
insert [0, N-1] into Best

for i = 1 -> K

Begin

let now = the highest summed segment 

print the sum of now 

let [L, R] be the start and end points of now 

insert L+1, R and L, R-1 into Best

End

mehe · August 15, 2018, 1:57am

is it named in any specific algorithm?

s1d_3 · May 1, 2016, 9:14pm

It’s actually O(n + klogk).
n for input and prefix sums.
And klogk as the while loop runs k times, and each time we insert and delete from the queue once, each having logk complexity.

abhj · December 4, 2019, 1:33pm

#include <bits/stdc++.h>
#define ll long long int
#define N 100000
#define M 1000000007
#define f(i,a,b) for(ll i=(ll)a;i<=(ll)b;i++)
#define rf(i,a,b) for(ll i=(ll)a;i>=(ll)b;i--)
#define po pop_back
#define pb push_back
#define lb lower_bound
#define ub upper_bound
#define ibs ios_base::sync_with_stdio(false)
#define cti cin.tie(0)
#define cot cout.tie(0)//solved by abhijay mitra
using namespace std;
#define watch(x) cout << (#x) << " is " << (x) << endl
int main(){
    ibs;cti; 
    ll n,k;cin>>n>>k;std::vector<ll> A(n+1);A[0]=0;
    ll sum=0;
    priority_queue<pair<ll, pair<ll,ll>> > s;
    for (int i = 1; i < n+1; ++i){
        cin>>A[i];sum+=A[i];
    }
    s.push({sum,{1,n}});
    while( k -- && !s.empty() ) {
        pair<ll, pair<ll,ll>> top = s.top();
        cout << s.top().first << " \n"[k == 0];
        s.pop();
        ll l = top.second.first;
        ll r = top.second.second;
        if( l != r ) {
            s.push(make_pair(top.first-A[l], make_pair(l+1, r)));
            s.push(make_pair(top.first-A[r], make_pair(l, r-1)));
        }
    }
    return 0;
}

Why is this showing wrong answer? I have used editorialist’s implementation

sigma_g · May 20, 2020, 7:12am

This gives WA, because notice that when we process l=3, r=5, then we add (l,r) = (3,4) and (4,5) to the queue. When processing each of these, we will add (3,3),(4,4),(4,4),(5,5) to the queue. Notice that (4,4) got added twice.

Editorialist used a set which removes duplicates by default.