AVGPR - Editorial

Setter- Anuj Maheshwari
Tester- Misha Chorniy
Editorialist- Abhishek Pandey

Simple

PRE-REQUISITES:

Array, Looping, Basic Math and Principles of Counting, Frequency Array.

PROBLEM:

Given an array of length N, find number of unordered pairs (i,j) such that their average exists in the array (real division), or in other words, find number of unordered pairs of (i,j) for which we can find an element in array A_k which satisfies-

2*A_k=A_i+A_j where A_i and A_j are i 'th and j 'th element of the array.

Unordered pair means that order of listing elements in pair doesnt matter. It means (2,1) is treated same as (1,2). In ordered pairs, we treat both of them as different.

QUICK EXPLANATION:

We immediately see the low constraints for value of A_i. We can make a frequency array to record frequency of each array element. The only problem is, A_i can be negative as well. This is easily solved by adding a constant K (preferably 1000) to all array elements. With all elements positive, we iterate over all possible pair of values of array elements, i.e. we check all pairs of (a,b) where 0\le a,b \le 2000. We check if a, b and their average exists in array and update answer accordingly.

EXPLANATION:

This editorial will discuss 2 approaches. First is mine (Editorialist), and second one is Mishaâ€™s (@mgch), i.e. the testerâ€™s. My approach is a bit more difficult than testerâ€™s because I use some formulas and observations, while testerâ€™s solution is simply basic.

We noticed a lot of Div2 users getting stuck in lots of useless and unnecessary things, hence we will answer how to overcome such things in Chef Vijjuâ€™s Corner at the end of editorial.

1. Editorialistâ€™s Approach-

The first thing I did was to add 1000 to each element so that all elements of array are non-negative and between [0,2000]. We can show that if we add any constant to all values of array, then although the average of numbers increases, it has no effect on â€śexistence of average (of 2 array elements) inside array.â€ť Try to prove it if you can. (Answer is in tab below).

Click to view

Given-
2*A_k=A_i+A_j
\implies 2*A_k+2*K=A_i+A_j+2*K
\implies 2*(A_k+K)=(A_i+K)+(A_j+K)

\because Both the expressions turn out to be same, we see that adding a constant K to all elements in array has no effect on â€śwhether average of two array elements exist in the array or not.â€ť

For now, forget about duplicates. My approach counts even duplicates in the process, and removes them later at the end.

Once we have the frequencies of elements inside array, I iterate from all possible â€śpairs of valuesâ€ť allowed, i.e. , we know that now values of array elements are in range [0,2000], so we will iterate over all pairs of (a,b) where 0 \le a,b \le 2000 and do the following-

**1.**If (a,b) exist, and if their average (by real division) exists in the array as well, goto 2, else check next pair.

To remove duplicates, simply divide the answer by 2 - because each pair is counted twice. The proof for the 2 formulas I used, along with fact that dividing by 2 removes duplicates is left to reader as an exercise. (Answer/Hints are discussed in Chef Vijjuâ€™s corner.)

Time Complexity-O({2000}^{2}+N)=O(N+K)=O(N) (where K is a constant).

Testerâ€™s Approach (Easy)-

The first step is the same, he also made all elements non-negative by adding 1000. He deals with duplicates immediately.

1.For each possible value in [0,2000], check if it exists in array. If it does, goto 2, else check the next value.
2.Count all pairs formed by this value and its duplicates by adding cnt[mid] * (cnt[mid] - 1) / 2 to the answer. mid here is the value being investigated.
3.If mid is average of 2 numbers, this means both the numbers are equidistant from mid. Hence, count all pairs possible due to presence of equidistant numbers present at a distant x , where x is in range [0,mid] and mid+x and mid-x are in valid range of [0,2000]. For each such pair, add cnt[mid - x] * cnt[mid + x] to the final answer.

Time complexity is same as my solution in worst case :).

CHEF VIJJUâ€™S CORNER:

**1.**Lets first discuss the mistakes. Most of the participants got the formulas rightâ€¦but they did a tremendous blunder. Suffered from Overflow!!. If you are declaring your frequency array as int, then for larger test cases where N is as large as {10}^{5}, the frequency of elements can be as large. Hence, when we do freq[a]*freq[b], we must make sure to use proper type casting, else the answer will overflow!! Many contestants stored answer in an int variable, and that code is doomed to be wrong even before submission!!

One thing I will tell, if you are getting AC for smaller sub tasks, and WA for larger ones, do check for overflow!!. There is a good 60\% chance that it is overflow issue giving you WA. We were going through the wrong solutions in detail, and our heart sank a little bit each time we saw a solution killed because of overflow

**2.**Some of them mistook |A_i| \le 1000 as 0 \le A_i\le 1000. The difference between the two is that, the first condition allows negative integers in the input. Whenever in such confusion, assume value of A_i to be negative - and ask yourself - Does it satisfy this constraint? If yes, its in the input, else its not.

**3.**Derivation of freq[a]*freq[b] in tab below.

Click to view

Say we have values freq[a]=K and freq[b]=L.

Now, for each of the K values of a in array, we can form a pair with all L values of B. Hence add freq[a]*freq[b] to the final answer.

My code will add freq[a]*freq[b] for both pairs, (a,b) and (b,a). Since this portion gets executed when a \neq b, hence each pair is counted twice. Half proof of why halving the value of answer removes duplicates.

Proof when a==b is on same lines.

**4.**Whenever you see that the max value of array elements is comparable to size of the array, frequency array approach can be used. Its usually asked as a part of question rather than an individual question, like here it was Math (principle of counting) +Frequency array. With some modification, you can also count number of each character in a string using this approach.

5. Frequency Array approach to count number of each element is one of the steps involved in Counting Sort , an O(N) sorting technique.

6. What would we do if this question had constraints on A_i upto |A_i| \le {10}^{5}? It would be a difficult question, but still possible to solve. That problem then, can be solved using FFT. We create a polynomial P(x) such that P(x)=\sum_{0}^{2*{10}^{5}}(freq[i]*{x}^{i}). On squaring P(x), we will have freq[i]*freq[j] for all (i,j) between 1 \le i,j \le N, but we need to check the case when i=j. The next step to do is, if i+j is even, and freq[(i+j)/2]>0 , it has to be added to the final answer. This outline is just meant to introduce beginners that there is something like FFT existing in this universe (xD) and to offer an interesting insight to the solution. Time Complexity is O(|A|Log|A|)

7. Questions to practice frequency array-

(More will be added if I find more, or if the community helps )

5 Likes

Whatâ€™s Mistake in my code, It keep failing last test case. Please find it at https://www.codechef.com/viewsolution/18264590

Vijju you can add this problem to frequency array practice list:

What is wrong with this solution?Can you please tell@vijju123
https://www.codechef.com/viewsolution/18306988

Can you please explain the method when |A[i]|<10^5.

Help me find the flaws in this code. https://www.codechef.com/viewsolution/18395894

@vijju123 i think if one can generate the required polynomial and visualize and find what coefficents to add and multiply then one can use numpy library without even fully understanding fft(info about arguments will suffice).
Though for straightforward problems only.

count= ((c[i])*(c[i]-1))/2;

I think you didnt read overflow section of editorial carefully.

Let me illustrate it with an example

Say, I got-

int a=100000;
int b=100000;
long long c=a*b;

a*b will be equal to {10}^{10}, but since both a and b are int, the reuslt will also be stored in int type, and then the overflowed result will be stored in c.

oooh. Noooooâ€¦ Thanks btwâ€¦

xD. I am glad you got AC then

Thank you

1 Like
x=g.count(e);
y=g.count(f);

count function is linear w.r.t. frequency of number. Imagine if each number is present. That, combined with LogN factor from map can be a reason. Why not try replacing this count by frequency array? Let me know of the results.

@vijju123 Thank you.I have added freq array to count the frequency of a number without using multiset g as earlier but this time i am getting WA instead of TLE.Can you please check what is missing.
https://www.codechef.com/viewsolution/18331245

e=b[j];
//if(e>(2*b[i]))
//break;
f=2*b[i]-b[j];

What if f<0? It can surely arise if 2*b[i]<b[j]

Take case-

1
5
-1000 5 5 -1000 0

and see if value of f goes to negative. If it does, then its undefined behavior which is causing WA.

Its an advanced topic which requires a very good grasp over mathematical topics like FFT etc. I purposefully didnt went too deep there lest it would confuse you. Try to have a grasp on the pre-requisites first.

Perhaps true. I dont know much about python, but of course you dont need to go into all the proof etc. of FFT. Just need to know what it does and how to use it/adapt it to the problems

1 Like

AC code: CodeChef: Practical coding for everyone
WA code: CodeChef: Practical coding for everyone
Approach is similar to editorial. the only difference between the WA and AC code is the size of frequency array, in AC code it is 20001 and in WA code it is 2001.
please ,explain the reason behind this.