SONYASEG - Editorial

vijju123 · June 16, 2018, 9:20pm

PROBLEM LINK:

Setter- Igor Barenblat
Tester- Misha Chorniy
Editorialist- You wont believe who this guy is when you click here

DIFFICULTY:

Easy

PRE-REQUISITES:

Combinatorics, Finding ^{n}C_k\% p using Fermat’s Little Theorem, Data structures like multiset.

PROBLEM:

Find number of ways to choose K out of N segments such that their common intersection is empty. Common intersection, in other words, means L_1 \cap L_2 \cap ... \cap L_k = \phi i.e. (empty/null). Print answer modulo {10}^{9}+7

QUICK EXPLANATION:

Key to AC- Its very easy if you have some practice in combinatorics, else the intuition may seem tricky. Finding number of violating cases was easier than finding number of good ones. Calculating answer as Ans=Total Cases-Bad Cases. Total cases, obviously, is ^{n}C_k.

We pre-calculate the inverse and factorial of required numbers to calculate ^{n}C_k in O(1) time. We can easily maintain a multiset (to maintain order). We can easily formalize when all K elements intersect as if min(R_1,R_2,..,R_{k-1}) \ge L_i where segment \{L_i,R_i\} is the segment under consideration. After this, its simple calculation of ^{n}C_k. We will discuss derivation under explanation section.

(WHAT? You want me to give away everything in Quick Explanation? xD.)

EXPLANATION:

This editorial will have a single section. Its assumed that you went through how to calculate ^{n}C_k as we wont be discussing this in detail in official part. We will simply see the intuition and derivation of formula. I will give whatever links I can for ^{n}C_k in Chef Vijju’s Bonus Corner :). We will refer to @mgch (tester’s) solution for implementation.

1. How to deduce that ans is calculated by finding number of bad combinations and subtracting them from total combinations?

This comes with practice. Really! It will seem something trivial to someone who is well versed with such questions. However, if you want what concrete steps we can analyze are-

Easy formalization of condition when all K segments intersect.
Total ways are easy to find. Simply ^{n}C_k
We will see below that a direct formula exists to find number of violating segments.

2. I have taken the input. What next?

Well, whats next? We solve the question! The very first thing to do is, we sort the segments. We sort the segments in increasing order of L_i. If L_i are same, then the segment with larger R_i comes first.

Why R_i in descending order? Simple, because if L_i are same, then inserting larger segment first helps us to easily find if k segments intersect. (Why?Q1)

Now we will maintain a multiset of lines. Multiset is used as it keeps them in sorted order. There are many implementations on what to store and how to use. Giving details of most easy one, I quote “we need not make a custom data type, merely storing the end points of lines can do.” (Why?Q2) A hint is given in tab below.

Click to view

Focus on condition min(R_1,R_2,..,R_{k-1}) \ge L_i. What are we comparing? What things are hence, needed to be stored in set? Did we account of L_1,L_2,..L_{k-1} anywhere above? Is it necessary to hence, store L_1,L_2...?

The multiset will store the R_i in ascending order. Now, when do 2 horizontal lines intersect? Can you try framing conditions on when they interesect and when they dont?

Click to view

Obviously! When R_1\ge L_2 where lines are \{L_1,R_1\} and \{L_2,R_2\}. When will they not intersect hence? Easy to see now, if L_2>R_1.

Now, we will follow a straightforward procedure-

For every segment \{L_i,R_i\} do the following-
WHILE multiset is not empty, and the lines dont intersect- Delete the line with smallest R_i from multiset and check again.
Number of violating ways using this segment \{L_i,R_i\} are ^{p}C_{k-1} where p=size \space of \space multiset
Insert end-point of this line in the set.

Lets discuss the intuition behind it. Step 1 is simple looping. Step 2, we discussed above when line intersect and when they dont. We need all k lines to intersect for a way to be violating. Hence, if i'th lines doesn’t intersect, we delete the line with smallest R_i from multiset. Now, either the multiset is empty, or we have number of lines which intersect with given i'th line. (Why?Q3)

Now, what we have in multiset is a set of lines which intersect with i'th line. We must choose i'th line, and are free to choose rest k-1 lines from the multiset. If, thus, size of multiset is p, then number of ways of choosing k-1 lines is simply ^{p}C_{k-1}. A simple code for above is given below-

    multiset < int > f;
	int bad = 0;
	for (int i = 1; i <= n; ++i) {
		while (!f.empty() && *f.begin() < p[i].first) {
			f.erase(f.begin());
		}
		bad = (bad + 1LL * comb(f.size(), k - 1)) % MOD;//comb(a,b)=aCb
		f.insert(p[i].second);
	}

SOLUTION:

The codes which I received are pasted below for reference. This is done so that you dont have to wait for @admin to link solutions up :-). Please copy them and paste at a place where you are comfortable to read :).

Setter

Click to view

 #pragma GCC optimize("O3")
#pragma GCC target("sse,sse2,sse3,ssse3,sse4,popcnt,abm,mmx,avx,tune=native")
#pragma GCC optimize("unroll-loops")
#include <bits/stdc++.h>
 
#define fast ios_base::sync_with_stdio(0); cin.tie(0); cout.tie(0);
#define files(name) name!=""?freopen(name".in","r",stdin),freopen(name".out","w",stdout):0
#define all(a) a.begin(),a.end()
#define len(a) (int)(a.size())
#define elif else if
#define mp make_pair
#define pb push_back
#define fir first
#define sec second
 
using namespace std;
#define int long long
 
typedef unsigned long long ull;
typedef pair<int,int> pii;
typedef vector<int> vi;
typedef long double ld;
typedef long long ll;
 
const int arr=2e5+10;
const int ar=2e3+10;
const ld pi=acos(-1);
const ld eps=1e-10;
const ll md=1e9+7;
 
///---program start---///
 
#define arr (int)(4e5+10)
 
int f[arr];
int rf[arr];
 
int bpow(int a,int n)
{
    int res=1;
    while (n){
        if (n&1){
            res=res*a%md;
        }
        n/=2;
        a=a*a%md;
    }
    return res;
}
 
int C(int n,int k) /// ways to comb
{
    if (k<0||k>n){
        return 0;
    }
    return f[n]*rf[k]%md*rf[n-k]%md;
}
 
void solve()
{
    int n,k;
    cin>>n>>k;
    vector<pii> a(0);
    for (int i=0;i<n;i++){
        int l,r;
        cin>>l>>r;
        a.pb({l,r});
    }
    sort(all(a));
    multiset<int> R;
    R.clear();
    int ans=C(n,k); /// all ways
    for (auto i:a){
        while (!R.empty()&&*R.begin()<i.fir){ /// delete all segments where r[j]<i.first
            R.erase(R.begin());
        }
        ans-=C(len(R),k-1); /// minus bad ways
        ans%=md;
        ans+=md;
        ans%=md;
        R.insert(i.sec);
    }
    cout<<ans<<"\n";
}
 
main()
{
    #ifdef I_love_Maria_Ivanova
        files("barik");
        freopen("debug.txt","w",stderr);
    #endif
 
    fast;
 
    /// precalc factorial and inverse
    f[0]=1;
    for (int i=1;i<arr;i++){
        f[i]=f[i-1]*i%md;
    }
    rf[arr-1]=bpow(f[arr-1],md-2);
    for (int i=arr-2;i>=0;i--){
        rf[i]=rf[i+1]*(i+1)%md;
    }
 
    int test;
    cin>>test;
    while (test--){
        solve();
    }
}

Tester

Click to view

    #include <bits/stdc++.h>
 
using namespace std;
 
const int MaxN = (int)4e5 + 10;
const int MOD = (int)1e9 + 7;
const int INF = 1e9;
 
int n, k;
int fact[MaxN], rfact[MaxN], inv[MaxN];
pair < int, int > p[MaxN];
int fenw[MaxN];
 
long long readInt(long long l,long long r,char endd){
	long long x=0;
	int cnt=0;
	int fi=-1;
	bool is_neg=false;
	while(true){
		char g=getchar();
		if(g=='-'){
			assert(fi==-1);
			is_neg=true;
			continue;
		}
		if('0'<=g && g<='9'){
			x*=10;
			x+=g-'0';
			if(cnt==0){
				fi=g-'0';
			}
			cnt++;
			assert(fi!=0 || cnt==1);
			assert(fi!=0 || is_neg==false);
			
			assert(!(cnt>19 || ( cnt==19 && fi>1) ));
		} else if(g==endd){
			if(is_neg){
				x= -x;
			}
			assert(l<=x && x<=r);
			return x;
		} else {
			assert(false);
		}
	}
}
string readString(int l,int r,char endd){
	string ret="";
	int cnt=0;
	while(true){
		char g=getchar();
		assert(g!=-1);
		if(g==endd){
			break;
		}
		cnt++;
		ret+=g;
	}
	assert(l<=cnt && cnt<=r);
	return ret;
}
long long readIntSp(long long l,long long r){
	return readInt(l,r,' ');
}
long long readIntLn(long long l,long long r){
	return readInt(l,r,'\n');
}
string readStringLn(int l,int r){
	return readString(l,r,'\n');
}
string readStringSp(int l,int r){
	return readString(l,r,' ');
}
 
 
int comb(int n, int k) {
	if (n < k) {
		return 0;
	}
	return 1LL * fact[n] * rfact[k] % MOD * rfact[n - k] % MOD;
}
 
bool cmp(pair < int, int > a, pair < int, int > b) {
	return a.first < b.first || a.first == b.first && a.second > b.second;
}
 
int en;
 
void solve() {
	n = readIntSp(1, 4e5);
	en += n;
	assert (en <= 4e5);
	k = readIntLn(1, n);
	for (int i = 1; i <= n; ++i) {
		p[i].first = readIntSp(1, 1e9);
		p[i].second = readIntLn(p[i].first, 1e9);
	}
	sort(p + 1, p + n + 1, cmp);
	multiset < int > f;
	int bad = 0;
	for (int i = 1; i <= n; ++i) {
		while (!f.empty() && *f.begin() < p[i].first) {
			f.erase(f.begin());
		}
		bad = (bad + 1LL * comb(f.size(), k - 1)) % MOD;
		f.insert(p[i].second);
	}
	printf("%d\n", (comb(n, k) - bad + MOD) % MOD);
}
 
int main() {
//	freopen("input.txt", "r", stdin);
	fact[0] = rfact[0] = 1;
	for (int i = 1; i < MaxN; ++i) {
		inv[i] = i == 1 ? 1 : 1LL * (MOD - MOD / i) * inv[MOD % i] % MOD;
		fact[i] = 1LL * i * fact[i - 1] % MOD;
		rfact[i] = 1LL * inv[i] * rfact[i - 1] % MOD;
	}
	int t = readIntLn(1, 1e3);
	while (t --> 0) {
		solve();
	}
	assert (getchar() == EOF);
	return 0;
}

Editorialist’s solution will be put on demand.

Time Complexity-O(NlogN)

CHEF VIJJU’S CORNER:

1. Ans Q1- Why R_i in descending order? Simple, because if L_i are same, then inserting larger segment first helps us to easily find if k segments intersect.

Click to view

Ans: Say we dont do that. If we insert smaller R_i first and then larger R_i, we may do wrong at step where we calculate number of bad ways, because the large segment may not have been inserted till then. Also, some smaller segments which intersect with larger one, but not with the small segment before them may get deleted. This will cause WA.

2. Ans Q2-I quote “we need not make a custom data type, merely storing the end points of lines can do.” (Why?Q2)

Click to view

We already sorted lines by their starting point. Hence a line appearing at earlier is guaranteed to start before current line i. And if that line (say line j) has R_j>L_i, they obviously intersect. We dont need the L_i values once the line is inserted mainly due to our previous step of sorting the line

3. Ans Q3-Now, either the multiset is empty, or we have number of lines which intersect with given i'th line.

Click to view

What was our condition for all k lines to intersect?

min(R_1,R_2,..,R_{k-1}) \ge L_i

Multiset gives us smallest R_i. Think the rest of it now!

4. You can find inverse-factorial in O(N) time. Refer here or at tester’s code.

5. What lies in here?

Click to view

???

Click to view

Compiling…

Click to view

Compiling…

Click to view

Compiling…

Click to view

Running…

Click to view

Running…

Click to view

Running…

Click to view

Running…

Click to view

COMPILATION ERROR ON TEST CASE 256

6. Test Case Bank

Click to view

All Test cases were huge. Need WA solutions to analyze

EDIT- One of the extra test case from setter-

7. Related Problems-

Team Players

thecodearrow · June 18, 2018, 12:49am

Do you have a list of ALL problems for which you’ve made editorials?

Seriously, this is god level editorial-preparation skill!

Take a bow! /\

tamiliit · June 18, 2018, 10:54am

I have a doubt regarding “Q1- Why Ri in descending order?”

I think the order of Ri does not matter, because we will not remove any element in step 2 expect for the first interval with same Li.
Hence the computations in step 3 does not care about the order of Ri.
Am I missing something?

I also ran a random test case check for Ri increasing and Ri decreasing and there was no difference in the result.

My code: CodeChef: Practical coding for everyone

shahanurag · June 18, 2018, 1:46pm

My approach :

sort according to Li in increasing order.
for each segment {Li, Ri} calculate the number of segments which have Lj < Ri(j > i) (let it be = x)
add (n-i-1)C(k-1) - (x)C(k-1) to answer because the total number of sets of segments formed with {Li, Ri} in the set would be choosing (k-1) from (n-i-1) (as we don’t want to double count we look at segments from [i,n-1] only) and the number of wrong sets would be = choosing (k-1) segments from those x segments which overlap with {Li, Ri}.


[1]

This approach is giving wrong answer verdict. Please tell me what I have missed.


  [1]: https://ideone.com/4C50OS

tamiliit · June 18, 2018, 2:07pm

@shahanurag consider intervals [1, 2], [3, 4], [5, 6].

For the interval [5, 6] x = 2. “choosing (k-1) segments from those x segments which overlap with {Li, Ri}” u will count (x)C(k - 1). But actually [1, 2] and [3, 4] does not intersect with [5, 6]

You need to be very careful while arguing for optimality of greedy solutions.

@update:
If you want to automatically find test cases where your solution fails, I suggest you copy code from CodeChef: Practical coding for everyone and put your logic in greedy2 function and run function_check in main.
Pro tip also adjust N_MAX and V_MAX to get simpler test cases.

vijju123 · June 18, 2018, 1:01am

Thank you for your kind words :). You can access the editorials (and hence problems) by looking at my discuss profile.

soham1234 · June 18, 2018, 11:09am

Yes it is not necessary. I was also gonna comment the same. Reason:- Only where sorting order would have mattered was in case of same Li. But Ri>=Li gurantees that Ri’s for same Li’s would stay in the multiset for all elements having Lj=Li.

vijju123 · June 18, 2018, 11:36am

Oh is it so? Could be. The schedule was very tight, and I saw tester take lot of pain to make the comp function sort lines this way. I will do the needful. Will first show this to @mgch to help me verify and then do the needed changes. Thanks again!

(I got your intuition now. Its just that there is very less time for experimenting. Thanks for pointing it out :D)

shahanurag · June 18, 2018, 2:14pm

@tamiliit I am sorry I forgot to mention j > i. I’ll update it.

shahanurag · June 18, 2018, 4:01pm

@tamiliit thanks

pareksha · January 31, 2020, 8:49am

I ran the test case
1
5 3
1 4
3 5
5 7
8 10
9 14
This gave me 10 as answer according to your code and explanation.

But I don’t understand why 10 is correct (5c3 = 10 which means every combination of 3 is valid).
According to me, only 2 should be the answer. I don’t know why combinations like
[(1 4), (3 5), (5 7)] are taken to be valid, when these are clearly overlapping.

Please correct me if I have missed anything in the question.

Thanks!

anon73162591 · January 31, 2020, 9:35am

There’s no point which is contained in each of the chosen segments [(1 4), (3 5), (5 7)]

Read the problem again

pareksha · January 31, 2020, 5:18pm

Oh yeah. Point should be common in all the segments for them to be overlapping. I misinterpreted the problem. I thought the statement meant to be common in any two segments.

Thanks!