EQMD - Editorial

vijju123 · July 26, 2019, 1:03pm

PROBLEM LINK:

Div1
Div2
Practice

Setter- Erfan Alimohammadi
Tester- Roman Bilyi
Editorialist- Abhishek Pandey

DIFFICULTY:

EASY

PRE-REQUISITES:

Ad-Hoc Algorithms , Observations

PROBLEM:

Given an Array A of N integers. Answer Q queries, where each query changes value of some array element at index id to value v. After each query, find maximum number of multi-sets you can form such that each multi set has the same median.

QUICK-EXPLANATION:

Key to AC- The number of multi-sets possible (with same median) is equal to frequency of median element in the array.

For quick explanation, I will give two versions of same theorem.

If you join merge two multisets which have same median M, then the merged multiset will also have a median M under our given rules (that no average happens in case of even size). We use it as - lets say I have K multisets with same median M, if I merge all of them, then I will get the entire array. By our theorem, the array and each of the K multiset must have same median - which is the median of the array.
The alternate wording of the above theorem is, “Under our given conditions, the median of the array is the only possible candidate for median of multisets.”

Hence, frequency of median element can be the key factor to answer the query! In fact, as you will later see, we will derive the formula as-

Ans=F_M-F_L+min(F_L,F_R) where-

F_M= Frequency of the median
F_L= Frequency of elements Less than the Median.
F_R= Frequency of elements Greater than the Median.

All that is left is, to find median of the array after every query, and required frequencies. We can find this, as well as handle updates by using an appropriate Data Structure like Segment Tree, or even simply 2 sets.

EXPLANATION:

Since the question is heavily dependent on the above theorem, we will first prove the theorem. Its use afterwards, is pretty obvious. The proof for alternate wording of the theorem, is left as an exercise for reader to attempt after checking proof for original theorem (hint/guideline will be in the bonus section). Once we are able to convince ourselves of the correctness of the theorem, we will discuss how it is used to find answer to the query. This helps us define the operations we need to support - and hence the last part will discuss how to maintain median and find required frequencies across all queries.

1. Proof of “If we merge 2 multisets of having same median M, then the merged multiset will also have M as median.”

Lets say I have 2 multisets, A and B, each having median M. Assume that their sizes are N_A and N_B respectively. Now, we will merge them to form a new multiset, and have to prove that this new multiset’s median is also M.

We see that with respect to N_A and N_B, there are 4 possibilities-

1. N_A and N_B are both odd - Here, for A, there will be \lfloor \frac{N_A}{2} \rfloor elements to either side of M (i.e. there are \lfloor \frac{N_A}{2} \rfloor elements smaller than M and an equal number of elements larger than M). Similar holds for array B. On combining them to the same multiset, we get the new size S=N_A+N_B. We can easily see that there are \lfloor \frac{N_A}{2} \rfloor+\lfloor \frac{N_B}{2} \rfloor elements on either side of M in the new multisets - and the middle 2 elements are having the value of hence M itself. Hence, median of the new multiset is also M.
2. N_A is even and N_B is odd- A has median of M, this means we can divide the array into 2 parts of equal size (by partitioning the sorted array in middle) such that first part has all elements \leq M and second part has all element \geq M. Now, the first part will go in left half of B, and second part will go to the right half of B. Since the size of both halves is same, the middle element of the combined multiset is unchanged and remains as M.
3. N_A is odd and N_B is even- The above method of point 2. can be repeated after swapping A and B.
4. N_A and N_B are both even- By now you’d have got the intuition of how to go about it. Sort both A and B, divide each of them in middle and merge. Its easy to see that middle 4 elements are equal to M itself!

No other case is possible. Hence, with all possibilities covered, the theorem stands proved.

2. Using the above Theorem-

The above theorem seems fancy. But whats the use if it does not help in the problem? . But fear not! It actually solves the entire problem for us

Let us say a maximum of K multisets are possible. I feel that if I can know that what are possible candidate array elements which can be their median, then that might help me!

I say that the multisets have median M. Recall that question imposes that all multisets should have the same median! I am now interested in finding what possible values M can take!

If I merge any 2 of the multisets, I will get a new multiset with same median M. If I merge this new multiset with another multiset, I will still get a multiset with median M. So, I go on merging until all multisets are merged, and I will see that their median is M only. But after merging all the multisets I get the array itself! Hence M can only take the value of median of the array.

Hence, the median of each multiset must be M, and hence not more than freq(M) multisets are possible where freq() tells us the frequency of the element in array.

3. How to find median and required frquencies for a query-

The simplest and the easiest way to achieve that would have been to use Policy Based Data Structure (Order Statistic Tree). Unfortunately, it gets a TLE (time taken : 3.84 \ seconds). But we should discuss this nonetheless, as it is very useful for future problems.
This data structure is an extension to set data structure, and supports even more operations than a set, namely - finding the K'th element of the set, count of elements less than K & etc. Hence, we can do insertions, deletions, finding median and getting required frequencies in O(LogN) very comfortably using its in-built functions .

Another way to achieve the same is to use 2 sets. Say my median is M. I will put all elements \leq M in one set, and all elements \geq M in another - and will try that their sizes are as close as possible (i.e. sizes are equal if N is even, and have a difference of 1 if N is odd).

Insertion and deletion happen as normally in the sets - you just have to chose appropriate set to delete from and insert into, while maintaining the almost equal size (i.e. exactly equal if N is even, and a difference of 1 if N is odd). Once that is done, the candidates for median are either the last element of first set (one with all elements \leq M) or the first element of the other set. This exactly depends on how you are maintaining the set, and if you are putting the extra element (for odd N) in first set or the second set &etc. But fundamentally, the idea is same - that the last element of the first set or first element of the second set should give us the median.

To get the frequency of elements less than or greater than M, get the frequency of M , i.e. the median, and use the fact that median is equally distributed (or almost equally distributed in case of odd parity) among the 2 sets. Eg- If the frequency of median is 6, and my N is even, then I know that each set has 3 occurences of it. Now simply doing set.size()-3 will give the me required frequency depending on the set. (Again, the exact method depends on your implementation!)

Or, if you don’t like the set methods, a solution from segment tree is also possible. I have previously written a lot in my editorials on how to do segment tree problems - and I feel its time for a fun exercise now! I’d like you all to analyze the segment tree solution given in detail and answer following questions (solutions will be posted after a week if no one is able to solve, but I want to encourage active participation)

Tell what the node of segment tree is storing.
Analyze the parent child relationship in it.

Now, all that is left is to derive the final formula.

With the theorem, we know that a maximum of Freq(M) multisets are possible where M is the median. In an ideal situation, where there are equal number of elements to the left and right of the median, the answer is freq(M). What if its not the case? Lets analyze them-

Let me denote the frequency of elements less than the median as F_L and greater than the median as F_R. Three cases arise, and we will give a proof by construction for each-

F_L == F_R - Freq(M) multisets are possible! We can do so by using multisets of type (L_i,M,R_i) and (M).
F_L < F_R - One thing to recall is that, F_R \leq Freq(M)+F_L i.e. the size of right side is less than the sum of frequency of median and sum of left size. (A similar rule applies for F_L as well, but that is irrelevant here. You can try proving the rule for F_R, the hint is in the bonus section.) Given above rule, we can make multisets of form (L_i, M, R_i) or (M,R_i) or (M) where L_i is some element to left of median and R_i is some element to right of median. We are able to get Freq(M) multisets in this case as well, by forming multisets of type (L_i,M,R_i) as long as possible, and then (M,R_i) or (M). Since Freq(M) + F_L \geq F_R and for this case F_L < F_R, we guarantee that first two type of multisets will exhaust all L_i and R_i.
F_L > F_R - This case is a little tricky. Tricky because, while same rule applies here, the multisets corresponding to multisets discussed above are (L_i,M,R_i) , (L_i,M) and (M). The second type of multiset does not have a median M, and hence we need to make it as (L_i,M,M). Now, in this case, freq(M) multisets are not possible, because for every multiset of type (L_i,M,M) we are consuming one extra median element! We need to make multisets of type (L_i,M,M) for every element L_i which cannot be paired with a corresponding R_i. Meaning, for F_L-F_R elements, we need to consume an extra median. Hence, answer for this case is Freq(M) - (F_L - F_R).

Combining all 3 cases into a single formula-

Ans=F_M-F_L+min(F_L,F_R)

You can verify that our formula accounts for all the 3 cases above!

SOLUTION

Segment Tree Solution for Exercise (Taken from some contestant)

#include <bits/stdc++.h>
using namespace std;
typedef long double ld;
typedef long long ll;
int n, q;
const int maxN = (int)1e6 + 100;
int t[4 * maxN];
int a[maxN];
int pos[maxN], ch[maxN];
 
void upd(int v, int tl, int tr, int pos, int d) {
    if (tl == tr) {
        t[v] += d;
        return;
    }
    int tm = (tl + tr) / 2;
    if (pos <= tm) upd(2 * v, tl, tm, pos, d);
    else upd(2 * v + 1, tm + 1, tr, pos, d);
    t[v] = t[2 * v] + t[2 * v + 1];
}
 
int get(int v, int tl, int tr, int l, int r) {
    if (l > r) return 0;
    if (tl == l && tr == r) return t[v];
    int tm = (tl + tr) / 2;
    return get(2 * v, tl, tm, l, min(r, tm)) + get(2 * v + 1, tm + 1, tr, max(l, tm + 1), r);
}
 
bool check_good(int le, int cnt, int ri) {
    return cnt >= ri - le && cnt >= le - ri + 1;
}
 
int find_med(int v, int tl, int tr, int addL, int addR) {
    if (tl == tr) {
        return tl;
    }
    int tm = (tl + tr) / 2;
    if (t[2 * v] == 0) {
        return find_med(2 * v + 1, tm + 1, tr, addL, addR);
    }
    else if (t[2 * v + 1] == 0) {
        return find_med(2 * v , tl, tm, addL, addR);
    }
    bool fl1 = check_good(addL, t[2 * v], addR + t[2 * v + 1]);
    bool fl2 = check_good(addL + t[2 * v], t[2 * v + 1], addR);
    assert(fl1 ^ fl2);
    if (fl1) {
        return find_med(2 * v, tl, tm, addL, addR + t[2 * v + 1]);
    }
    else {
        return find_med(2 * v + 1, tm + 1, tr, addL + t[2 * v], addR);
    }
}
void solve() {
    cin >> n >> q;
    //n = 100000;
    //q = 100000;
    if (n > 100000 || q > 100000) {
        exit(0);
    }
    vector < int > all;
    for (int i = 1; i <= n; i++) {
        cin >> a[i];
        all.push_back(a[i]);
    }
    for (int i = 1; i <= q; i++) {
        cin >> pos[i] >> ch[i];
        all.push_back(ch[i]);
    }
    sort(all.begin(), all.end());
    all.erase(unique(all.begin(), all.end()), all.end());
    int m = all.size();
    for (int i = 1; i <= 4 * m + 10; i++) {
        t[i] = 0;
    }
    for (int i = 1; i <= n; i++) {
        int where = lower_bound(all.begin(), all.end(), a[i]) - all.begin() + 1;
        upd(1, 1, m, where, 1);
    }
    for (int i = 1; i <= q; i++) {
        int id = pos[i];
        int where = lower_bound(all.begin(), all.end(), a[id]) - all.begin() + 1;
        upd(1, 1, m, where, -1);
        a[id] = ch[i];
        where = lower_bound(all.begin(), all.end(), a[id]) - all.begin() + 1;
        upd(1, 1, m, where, 1);
        int med = find_med(1, 1, m, 0, 0);
        int cntMed = get(1, 1, m, med, med);
        int cntL = get(1, 1, m, 1, med - 1);
        int cntR = get(1, 1, m, med + 1, m);
        assert(check_good(cntL, cntMed, cntR));
        cout << min(cntMed, cntMed - (cntL - cntR)) << '\n';
    }
}
int dp[40][40][40];
int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(nullptr);
    //freopen("input.txt", "r", stdin);
    int tst = 100;
    cin >> tst;
    while (tst--) {
        solve();
    }
    return 0;
}

Tester (Hasan)

#include <iostream>
#include <algorithm>
#include <string>
#include <map>
#include <assert.h>
using namespace std;
 
 
 
long long readInt(long long l,long long r,char endd){
	long long x=0;
	int cnt=0;
	int fi=-1;
	bool is_neg=false;
	while(true){
		char g=getchar();
		if(g=='-'){
			assert(fi==-1);
			is_neg=true;
			continue;
		}
		if('0'<=g && g<='9'){
			x*=10;
			x+=g-'0';
			if(cnt==0){
				fi=g-'0';
			}
			cnt++;
			assert(fi!=0 || cnt==1);
			assert(fi!=0 || is_neg==false);
			
			assert(!(cnt>19 || ( cnt==19 && fi>1) ));
		} else if(g==endd){
			assert(cnt>0);
			if(is_neg){
				x= -x;
			}
			assert(l<=x && x<=r);
			return x;
		} else {
			assert(false);
		}
	}
}
string readString(int l,int r,char endd){
	string ret="";
	int cnt=0;
	while(true){
		char g=getchar();
		assert(g!=-1);
		if(g==endd){
			break;
		}
		cnt++;
		ret+=g;
	}
	assert(l<=cnt && cnt<=r);
	return ret;
}
long long readIntSp(long long l,long long r){
	return readInt(l,r,' ');
}
long long readIntLn(long long l,long long r){
	return readInt(l,r,'\n');
}
string readStringLn(int l,int r){
	return readString(l,r,'\n');
}
string readStringSp(int l,int r){
	return readString(l,r,' ');
}
 
int T;
int n,q;
int arr[100100];
 
 
int val,rep,si,le,ri;
 
map<int,int> A;
map<int,int>::iterator ii;
void add(int x){
	if(si == 0){
		si=1;
		val=x;
		le=0;
		ri=0;
		rep=0;
		A[x]++;
		return;
	}
	A[x]++;
	if(val<x){
		ri++;
	}
	if(x<val){
		le++;
	}
	if(si % 2 ){
		if(x < val){
			if(rep > 0){
				rep--;
			} else {
				ri += A[val];
				ii= A.lower_bound(val);
				ii--;
				val = ii->first;
				le -= A[val];
				rep = A[val] -1;
			}
		}
	} else {
		if(x >= val){
			if(rep +1 <A[val] ){
				rep++;
			} else {
				le += A[val];
				val = A.upper_bound(val)->first;
				rep = 0;
				ri -= A[val];
			}
		}
	}
	si++;
}
 
void del(int x){
	if(si ==1 ){
		si=0;
		val =-1;
		le=0;
		ri=0;
		rep = -1;
		A.clear();
		return;
	}
	if(si % 2){
		if(x >= val){
			if(rep > 0){
				rep--;
			} else {
				ri += A[val];
				ii= A.lower_bound(val);
				ii--;
				val = ii->first;
				le -= A[val];
				rep = A[val] -1;
			}
		}
	} else {
		if ( x< val){
			if(rep +1 <A[val] ){
				rep++;
			} else {
				le += A[val];
				val = A.upper_bound(val)->first;
				rep = 0;
				ri -= A[val];
			}
		} else if( x== val && rep == A[val]-1){
			if(rep +1 <A[val] ){
				rep++;
			} else {
				le += A[val];
				val = A.upper_bound(val)->first;
				rep = 0;
				ri -= A[val];
			}
		}
	}
	si--;
	A[x]--;
	if(x < val){
		le--;
	}
	if(val < x){
		ri --;
	}
	if(A[x] == 0){
		A.erase(x);
	}
}
 
 
int main(){
	//freopen("03.txt","r",stdin);
	//freopen("03o.txt","w",stdout);
	T=readIntLn(1,1000);
	while(T--){
		n=readIntSp(1,100000);
		q=readIntLn(1,100000);
		si=0;
		val=-1;
		rep=-1;
		le=0;
		ri=0;
		A.clear();
		for(int i=1;i<=n;i++){
			if(i==n){
				arr[i]=readIntLn(1,1000000000);
			} else {
				arr[i]=readIntSp(1,1000000000);
			}
			add(arr[i]);
		}
		while(q--){
			int idx, new_val;
			idx=readIntSp(1,n);
			new_val=readIntLn(1,1000000000);
			del(arr[idx]);
			arr[idx]=new_val;
			add(arr[idx]);
 
			int tk= min(le,ri);
 
			printf("%d\n",A[val] - (le - tk));
		}
	}
	assert(getchar()==-1);
	
}

Time Complexity=O(NLogN)
Space Complexity=O(N)

CHEF VIJJU’S CORNER

Hint for proving the alternate wording of the theorem

Note that this proof uses the fact that "No matter what I do, I cannot divide the elements into multisets having a median other than M (M being median of array). Start from full array, and focus on the multiset containing M. See that if you try to make its median as anything other than M, there is some multiset or other which is having an unequal median. You can make case as I made in earlier proof.

2.How does the question change if you are allowed to only partition the array into subarrays and each subarray has to be having same median? Can you think of anything to solve this version?

Tester's Notes

Find frequency of median call it M, find number of elements smaller than median call it L, bigger than median call it R.
Now the answer is: M - L + min(L,R)

It’s easy to see how we arrive here using this lemma:
“If we have 2 multisets with the same median, then the median of their union is also the same.”