NB3000 Editorial

Problem link
Problem setter : maazbinasad3
Problem tester: serius_white
Editorialist: maazbinasad3

Difficulty:

Medium-Hard

Prerequisites :

Probability, Discrete Mathematics

Problem:

Given a dataset, that has some features which have two kind of labels associated with them, you need to predict the probability of occurrence of both the labels given some features.

Explanation:

Since, you need to predict label, given some features, you should think of conditional probability here. Conditional probability finds the occurrence of ‘main’ event given some ‘other’ event has already occurred. The ‘other’ event that has occurred is the features and the ‘main’ event here is the occurrence of label which can be either yes or no.
How to find conditional probability of occurrence of ‘main’ event, given ‘other’ event when we are given probability of ‘other’ event, given main event? The answer is Bayes theorem. Same thing was given in problem statement. You were given the data to find occurrence of features, given that mail was spam or not spam and you needed to find probability of email being spam (or not spam), given some features (just reverse the conditional probabilities). More formally, to find P(A | B), you were given all the information to find P(B | A).
Suppose, B is our single feature and you are trying to find probability of email being spam. Then, you need to count all the instances where feature B occurred given that label associated was spam and divide by count of number of total spam labels. Same goes with finding probability of not spam.
What if there are multiple features instead of one? Here comes the role of binomial distribution.

What does binomial distribution does here? Well, binomial distribution is the distribution of occurrence of only two kind of events on independent trials. Since, it was mentioned that this was binomial distribution, you could take features as independent features. That is, occurrence of one feature is not depending on other.
Conditional probabilities and prior probabilities of two independent events can be multiplied. Therefore, if we are given multiple features, we can multiply their conditional and prior probabilities and keep the values in Bayes formula to get the result.

Setter’s Solution:

#pragma GCC optimize("O2")
#include<iostream>
#include<algorithm>
#include<limits>
#include <cmath>
#include<unordered_map>
#include<vector>
#include<unordered_set>
#include<set>
#include<stack>
#include<queue>
#include<cstring>
#pragma warn -rvl #pragma warn -par  #pragma warn -rch
#define pb push_back
#define all(r) r.begin(),r.end()
#define f first
#define s second
#define um unordered_map
#define pi pair<int,int>
#define vi vector<int>
#define mod 1000000007
#include<fstream> 
typedef long long int ll; typedef long int li;
using namespace std;
bool isPrime(ll x){

for(ll i=2;i*i<=x;i++){
if(x%i==0) return false;
}
return true;
}
ll gcd(ll a,ll b){
if (a == 0)
   return b;
if (b == 0)
   return a;
if (a == b)
    return a;
if (a > b)
    return gcd(a-b, b);
return gcd(a, b-a);

}
int main()
{
ifstream fin;
ofstream fout;
//fin.open("ipl-inp-1.in");
//fout.open("ipl_out1.out");
ll t;
cin>>t;
while(t--){
ll n,l,e;
cin>>n>>l;
vector<vector<ll>>v;
for(int i=0;i<n;i++){
vector<ll>a;
for(int j=0;j<l;j++){
    cin>>e;
    a.push_back(e);

 }
v.push_back(a);
}
vector<string>label;
string s;
for(int i=0;i<l;i++){
cin>>s;
label.push_back(s);
}
ll f;
vector<ll>features;
for(int i=0;i<n;i++){
cin>>f;
features.push_back(f);
}
unordered_map<ll,ll>conditional_probab_yes;
unordered_map<ll,ll>conditional_probab_no;
unordered_map<ll,ll>prior_probab;
for(int i=0;i<n;i++){
for(auto vect:v){
    for(int j=0;j<l;j++){
        if(features[i]==vect[j]){
            prior_probab[features[i]]++;
        }
        if(features[i]==vect[j] && label[j]=="yes"){
            conditional_probab_yes[features[i]]++;
        }
        if(features[i]==vect[j] && label[j]=="no"){
            conditional_probab_no[features[i]]++;
        }

    }
 }
}
ll post_yes=0;
ll post_no=0;
for(int i=0;i<l;i++){
if(label[i]=="yes") post_yes++;
else post_no++;
}
double numerator=1;
double denomenator=1;
for(auto i:conditional_probab_yes){
numerator*=((double)i.second/(double)post_yes);
 }
numerator*=((double)post_yes/(double)l);
for(auto i:prior_probab){
denomenator*=((double)i.second/(double)l);
}
 double ans=((double)numerator/(double)denomenator);
double numerator1=1;
double denomenator1=1;
for(auto i:conditional_probab_no){
numerator1*=((double)i.second/(double)post_no);
}
numerator1*=((double)post_no/(double)l);
for(auto i:prior_probab){
denomenator1*=((double)i.second/(double)l);
}
double ans1=((double)numerator1/(double)denomenator1);
cout<<ans<<" "<<ans1<<"\n";
}
}