Practice

Easy

Frequency array

# PROBLEM:

You are given an array S consisting of N positive integers (1 based indexing). You are also given an integer T.
Find the number of quadruples (i,j,k,l) such that S[i]+S[j]+S[k]+S[l]=T, where 1\le i<j<k<l\le N

# QUICK EXPLANATION:

We try to find the number of quadruples (i,j,k_x,l) where k_x is a fixed term. We store the frequency of all pair sums S[i]+S[j] (i<j<k_x) in a frequency array Cnt[].

Next we iterate over all valid values of l. For each valid l_y, the number of pairs \{i,j\} such that quadruple (i,j,k_x,l_y) forms a quadruple (which sums to T) is equivalent to the frequency of value T-(S[k_x]+S[l_y]) in Cnt[].

This is done over all values of k_x to get the desired answer!

This solution is O(N^2+N) for every value of k_x (O(N^2) for computing frequencies and O(N) for iterating over all valid values of l). This can be improved by observing the below criteria.

The only new pair sums in the frequency array of k_{x+1}, as compared to frequency array of k_x, are pair sums over all valid pairs \{i,k_x\} where i<k_x.

Thus, we iterate through all values of k_x in a sequential fashion, and after finding all quadruples with fixed term k_x, we add pair sums of all valid pairs \{i,k_x\}\space(i<k_x) to the frequency array.

# EXPLANATION:

//ll is long long int, its used to prevent overflows
ll cnt = 0;

for(ll i = 1; i <= N; i++)
for(ll j = i + 1; j <= N; j++)
for(ll k = j + 1; k <= N; k++)
for(ll l = k + 1; l <= N; l++)
if(S[i] + S[j] + S[k] + S[l] == T) cnt++;


This brute force iterates through all possible quadruples (i,j,k,l) and checks if S[i]+S[j]+S[k]+S[l] equals T. If it does, it increases cnt by 1, which will hold the final answer.

The complexity of this code is O(N^4), as there are 4 nested loops, each doing an iteration through the array.

This will only \color{green}{AC} in the first subtask, while \color{red}{TLE} in the other subtasks due to larger constraints.
This motivates us to find a faster algorithm!

Another ‘solution’ would be to store pair sums S[i]+S[j] over all valid pairs \{i,j\} (1\le i<j\le N) in another array, say B. The answer would then be the number of pairs in B which sum up to T.

Notice that i \ne j \ne k \ne l, implying that a particular element can be taken only once in a quadruple. This case is however not handled by this logic, and results in \color{red}{WA}.

TC that kills this solution
4 15
1 2 3 6


It can be easily verified that the answer is 0 as there are no valid quadruples with sum 15.

The elements in array B would be

3  (1 + 2)
4  (1 + 3)
7  (1 + 6)
5  (2 + 3)
8  (2 + 6)
9  (3 + 6)


The number of pairs in B with sum 15 is 1 (7+8). However, quick inspection shows that the number 6 is repeated twice (once in 7 and once in 8), which violates the condition that i \ne j \ne k \ne l.

Thus, the above mentioned method has been proved wrong!

The reason why this incorrect method is shown is to outline the intuition that helps arrive at the correct solution!

The only flaw in the above solution was that the case of repeating elements was not handled. The solution we are going to talk about manages this problem quite efficiently!

Let us start by calculating the number of valid quadruples (i,j,k_x,l) with k_x as a fixed term.

Now we compute pairs sums S[i] + S[j] over all pairs \{i,j\} with i<j<k_x. Store this in another array, say L.
Similarly, compute pair sums S[k_x] + S[l] over all values of l where k_x is fixed (as stated earlier) and k_x<l. Store this in one more array, say R.

The number of quadruples with sum T (when k is fixed) is the number of valid pairs \{a,b\} such that L[a] + R[b] = T.

This implementation (fixing k and then finding valid quadruples) ensures that elements are not repeated while counting the number of quadruples!

Before we continue further, lets examine what we just did (and see if we can make it better!)

Lets focus on array L. Instead of storing all (valid) pair sums in L, couldn’t we just store its count in a frequency array?
Doing so enables us find the count of elements in L with a particular sum in O(1).

Thus, the frequency array stores the frequencies of every pair sum in L.

So, instead of iterating through all pairs \{a,b\}, we simply iterate through array R, and for each R[b], the number of values of a such that L[a]+R[b]=T, is simply the value at index T-R[b] in the frequency array L!

Also, for a memory optimized solution, we can notice that we don’t require storing the pair sums in R. Rather, we can compute all quadruples (for a fixed k) while iterating over the values of l on the fly!
We can achieve this by updating the frequency array L over all pair sums of all valid pairs \{i,j\}\space(i<j<k), and then iterating over all possible values of l. For each l_y, the number of values a such that L[a]+(S[k_x]+S[l_y])=T, is the value at index T-(S[k_x]+S[l_y]) in the frequency array L!

Analysis of the above mentioned approach

Let’s analyze the time complexity of the above method, for a fixed k.

Computing the pair sums for the frequency array takes O(N^2). Computing pair sums in R takes O(N), as the value of k is fixed, and l iterates only once through the input array. The total time complexity of the method for a particular k is

O(N^2+N) \approx O(N^2)

Doing similarly over all N values of k would take

O(N*N^2) = O(N^3)

This solution unfortunately \color{red}{TLE}'s in the last subtask.
However it can be made faster (time complexity mentioned above) by avoiding repetitive counting, and is outlined below!

Assume we want to process the number of quadruples for a particular k, say k_x. So the frequency array contains pair sums over all pairs \{S[i],S[j]\} (i<j<k_x).
Now, imagine we finished with the above task and now want to compute all quadruples with the fixed value of k as k_x+1. The frequency array in this case contains pair sums over pairs \{S[i],S[j]\} (i<j<k_x+1).

Notice that all the values in the first frequency array is included in the second frequency array! It can be easily verified that the only new additions are pair sums \{i,k_x\} for all valid values of i (i<k_x).

Proof

The first frequency array contains pair sums of all pairs \{i,j\} (i<j<k_x). Thus it contains pair sums of pairs :

\begin{aligned} \{1,2\},\{1,3\},\dots,\{1,k_x-1\}\\ \{2,3\},\{2,4\},\dots,\{2,k_x-1\}\\ \dots\\ \{k_x-2,k_x-1\} \end{aligned}

The second frequency array contains pair sums of all pairs \{i,j\} (i<j<k_x+1). It contains pair sum of pairs :

\begin{aligned} \{1,2\},\{1,3\},\dots,\{1,k_x-1\},\{1,k_x\}\\ \{2,3\},\{2,4\},\dots,\{2,k_x-1\},\{2,k_x\}\\ \dots\\ \{k_x-2,k_x-1\},\{k_x-2,k_x\}\\ \{k_x-1,k_x\} \end{aligned}

You can notice that the second frequency array contains all the pairs as in the first frequency array, with a few additions namely, pairs

\{1,k_x\},\{2,k_x\},\dots,\{k_x-1,k_x\}

Thus, we process the values of k in a sequential order(1,2,\dots), and keep updating the frequency array for all (valid) pairs \{i,k_x\} after computing the number of quadruples with element k_x!

The time complexity with this optimization is now O(N^2).

Analysis of the above approach

Let’s analyze the time complexity of this optimized method, for a fixed k, say k_x.

As we are building the frequency array over all the already present values in it, we just add pair sums of \{i,k_x\} to it.
Thus, computing the pair sums for the frequency array takes O(N).

Computing pair sums in R takes O(N), as the value of k is fixed, and l iterates only once through the input array. Thus, the total time complexity of the method for a fixed k is

O(N+N) \approx O(N)

Doing similarly over all N values of k would take

O(N*N)=O(N^2)

which is an optimal solution to the problem!

# IMPLEMENTATION:

• Let the input array be S. We shall be using 0-based indexing here.
• Let us make a frequency array Cnt[] to store the count of pairs with particular sum.
Notice that, the maximum value of T is 10^6. So, it is feasible to use an array of size T as the data structure for Cnt[]. Any pair with sum > T can be ignored, as the sum can never decrease (as all elements are positive).
Initially, all values in Cnt[] are set to 0.
• Make a variable ans to hold the final answer. You may need to use a 64 bit data type, as the value may exceed 10^9. Initially ans = 0.
• Now we iterate through all values of k, starting from 0 up to N-1. Let the value we are currently processing by k_x.
• Next, for every l_x\space(k_x<l_x), we add the value of \text{Cnt}[T-(S[k_x]+S[l_x])] to ans. But wait! We skip pair sums S[k_x]+S[l_x] if they exceed the value of T.
• That done, we now update the frequency array for all pairs \{i,k_x\} (i<k_x).
More formally, we add 1 to \text{Cnt}[S[i]+S[k_x]] for all valid values of i. Again, we skip all pairs with pair sum > T.
• Once we finish iterating through all values of k, we output ans which is the required answer!

# SOLUTION SNIPPET:

vi Cnt(T + 1); //To store the number of pairs with a particular sum

ll ans = 0; //Stores the final answer
for(int k = 0; k < N; k++){
for(int l = k + 1; l < N; l++){
if(A[k] + A[l] > T) continue;
//break as the pair sum exceeds T from this point forward

ans += Cnt[T - (A[k] + A[l])];
//Add the count of pairs with sum T - (A[k] + A[l]) to ans!
}

for(int i = 0; i < k; i++){
if(A[i] + A[k] > T) continue;

Cnt[A[i] + A[k]]++;
//Another pair with sum A[i] + A[k]
//So we increase the count by 1
}
}

cout << ans << endl; //Print the answer!


# TIME COMPLEXITY:

Processing all pairs \{k,l\} takes O(N^2), as there are 2 nested loops iterating through the entire array.

Processing and adding pair sums to Cnt[] (adding takes O(1) in an array) takes O(N^2), as again, there are two nested loops iterating through the whole array.

Thus, the overall time complexity is

O(N^2+N^2) \approx O(N^2)

# SOLUTIONS:

Editorialist’s solution can be found here.

# SIMILAR PROBLEMS:

Will be added! The search continues

Bonus : Could you tweak the above solution to find the number of quadruples with sum T, such that

1. Elements can be repeated
2. You may select at most 4 elements (in the sense, you can select less than 4 elements too). However, you can’t repeat elements. You can set aside the quadruple constraint for this.
3. Both 1. and 2.
Solutions

The below mentioned solutions are just outlines and require you to come up with the entire solution!

### 1

You can solve it using the incorrect method mentioned in the beginning, just remove the i<j constraint while computing pair sums

### 2

• Make the value of Cnt[0] = 1 at the beginning.
• Apart from storing pair sums in Cnt[], also add individual elements to the frequency count.
• Apart from calculating Cnt[T - (S[k] + S[l])], also calculate Cnt[T - S[k]].
• Point 1 was done to compute the number of elements in S with value T. This is to count the number of 1-sized tuples which sum to T.

### 3

Homework

Did you like the editorial? Do you have any other approaches to the problem? Any suggestions? Comment them below!

Experimental: For better evaluation of my editorials, I request all readers to rate the editorial (on a scale from 1-5, where 5 represents an awesome editorial)

• 1
• 2
• 3
• 4
• 5

0 voters

Also don’t forget to up-vote this post if you liked it !

8 Likes

You can also use hashing to do the job very easily

Yes. That was in my first version, but I removed it for two reasons

Also, frequency array is easier to understand, as compared to hashing.

2 Likes

Frequency array will be good for t < 10^6 while unordered map will be good for t> 10^6 due to space constraints.

How are you going to declare an array of size 10^9

For this particular problem, frequency array is the easiest approach for beginners. T >10^9 can be left as bonus.

Now it is correct right?

How to solve T>10^9 Just give a hint

Hashing, unordered map. We are just increasing value of T. With unordered map, it should work a bit around 1.5-1.8 second.

That’s because the setter was kind hearted. Unordered map can be easily hacked, which would make it O(N) per operation, leading to TLE. But yes, I get your point.

You know that you can use custom hash function for unordered map which many people do to avoid anti hash test cases against standard one?

1 Like

I see low rating of this editorial as compared to other editorials of mine. Can anyone tell me where I’ve fallen short? I will definitely improve the editorial (and my future editorials) based on your suggestions!

I felt that it was tough to understand.
But that is probably because I am dumb.

2 Likes

I really need people like you so that I can improve my content. Tell me. Which part of the editorial is difficult to grasp? Also, these zco editorials are written so that even beginners can understand them, so I feel this editorial isn’t upto the mark.
Just tell me where you’re stuck/confused and I shall resolve it asap.

It could be better if you could give examples in the later stages so that we can understand what you are trying to do.
Also, I feel that this is more of a vigorous proof kinda thing. If you could explain what is behind those equations it would help a bit.
Thanks for listening to my dumb opinion

For this, I sadly can’t do anything. As an editorialist, it’s my duty to give thorough proofs for every claim and method mentioned in the editorial.
However, for the convenience of beginners and those without much of a math background, I’ve just stated the claims and have enclosed the proofs in a hidden box, so as to not scare them away.

This is a valid suggestion, tho the editorial becomes lengthy and confusing at many points; Thus, examples are only given at places where the statements are not that clearly understandable.

In any case, you can simply tell me the part of the editorial which confuses/bothers you and I’ll see if it can be made better!

I found out that changing the order of loops(i, l) changes the answer even though intuitively I feel that the i loop should be first. Why is that?

Take this example.

4 11
1 3 4 100


Your logic would print 1, but the correct answer is 0. Why does this happen?
For 3 as a fixed value of k, your logic first includes pair sum 1+3 in the frequency array, and then iterates over the values of l.
Thus, when you encounter k+l=3+4, you’d query for the frequency of 11-(3+4)=4 in the frequency array, which is wrong, as you are including the number 3 twice in the quadruple sum (as 4 in the frequency array is a combination of values 1 and 3).

Yeah, now I understand. Thanks a lot.