CMX1P03 - Code To Get Her - Code Mates x1 Editorial

mayureshpatle · August 31, 2020, 5:20pm

PROBLEM LINK:

Contest
Practice

Author & Editorialist : Mayuresh Patle
Tester : Uttkarsh Bawankar

DIFFICULTY:

EASY

PREREQUISITES:

Combinations, Counting, Modular Arithmetic, Modular Multiplicative Inverse.

PROBLEM:

You are given a string S containing lowercase English alphabets. You are also given a list of N names (which are strings too, containing only lowercase English alphabets). For each of these names, you’ve to calculate the total number of sub-sequences of string S which are anagrams of that name. Since this value could be very large, calculate this value modulo 998244535. This value is called as the score of the name.

Finally, print the name having maximum score, if there are more than one such names, then print the one which appeared first in the list. If all names have 0 score, then print “No Research This Month” (without quotes).

You’ve to do this for M testcases.

QUICK EXPLANATION

The score of each name is, \prod ^{S.count(c)}C_{name.count(c)} for each unique character c in name. Here, ^nC_r represents the total number of ways to select r objects out of available n objects.

EXPLANATION:

By definition, if a two strings are anagrams of each other, then the frequency of occurrences of each unique character in both of the string must be same. For example, if two string A and B are anagrams of each other, and the character c occurs exactly x times in A then, c must occurs exactly x times in B and vice-versa.

Let’s denote the count of occurrences of character c in string A by A.count(c). Now, we want to count the number of sub-sequences of S which satisfy the the above condition. Hence,

We can first count and store the frequency of occurrences of each of the 26 alphabets in S.
Then, for each name in the list, we will store the count of occurrences of each alphabet.
Now, for each character c, we need to select exactly name.count(c) occurrences out of available S.count(c) occurrences in S. Total number of ways to select these for character c is ^{S.count(c)}C_{name.count(c)}.
Hence, the final score will be the product of above expression for each c \in lowercase alphabets.

Some points to mention:

The calculations might lead to overflow, hence, apply modulo operation after each step.
After applying modulo operation, direct division will give wrong results, hence to calculate \frac{A}{B} \mod 998244353, you will have to calculate (A\ * B^{-1}) \mod 9982443535. Here, B^{-1} is the modular multiplicative inverse of B under modulo 998244353.

Time Complexity : O(|S| + N*(|name|+26*log_2(9982443535) )) for each testcase.

SOLUTIONS:

C++ Solution

#include<bits/stdc++.h>
using namespace std;
typedef long long ll;
typedef vector <ll> vll;

const ll md=998244353, maxN=100001;

//function for modular exponentiation, returns (n^p) mod md
ll modex(ll n,ll p)
{
    ll ans=1;
    n%=md;
    while(p)
    {
        if(p&1) ans=(ans*n)%md;
        p>>=1;
        n=(n*n)%md;
    }
    return ans;
}

ll fact[maxN]={1};

/*This function returns nCr mod md, i.e { n!/((n-r)!*r!) } mod md
nCr = no. of ways to select r objects out of n available objects
This function works in O(log2(md)) time*/
ll ncr(ll n,ll r)
{
    if(n<r) return 0;
    ll num=fact[n], den=(fact[n-r]*fact[r])%md;
    ll inv=modex(den,md-2);
    return (num*inv)%md;
}

int main()
{
    //fast io
    ios_base::sync_with_stdio(0);
    cin.tie(0);
    cout.tie(0);

    ll M,N,i,maxscore;
    string S,name,result;

    for(i=1;i<maxN;++i) fact[i]=(i*fact[i-1])%md;  //calculate factorials

    cin>>M;
    while(M--)
    {
        cin>>S;
        result="No Research This Month";
        maxscore=0;

        vll cntS(26);
        for(char c:S) ++cntS[c-'a'];

        cin>>N;
        while(N--)
        {
            cin>>name;
            ll score=1;
            vll cntname(26);

            for(char c:name) ++cntname[c-'a'];
            for(i=0;score and i<26;++i)                     //calculate score
            {
                score=(score*ncr(cntS[i],cntname[i]))%md;
            }
            cout<<score<<"\n";                              //print score

            if(score>maxscore)                              //update result
            {
                maxscore=score;
                result=name;
            }
        }
        cout<<result<<"\n";                                 //print result
    }
    return 0;
}

Python 3 Solution

a='qwertyuiopasdfghjklzxcvbnm'
md=998244353
def ncr(n,r):                  #takes O(r*log2(md)) time
    num,den=1,1
    for i in range(r):
        num=(num*(n-i))%md
        den=(den*(i+1))%md
    inv=pow(den,md-2,md)
    return (num*inv)%md
    
for _ in range(int(input())):
    s=input()
    cnts={}
    for c in a: cnts[c]=0
    for c in s: cnts[c]+=1
    maxscore,result=0,"No Research This Month"
    for _ in range(int(input())):
        name=input()
        score=1
        cnt={}
        for c in a: cnt[c]=0
        for c in name: cnt[c]+=1
        for c in a:
            score=(score*ncr(cnts[c],cnt[c]))%md
            if score==0: break

        print(score)
        if score>maxscore:
            result,maxscore=name,score
    print(result)

not_abhishek · August 31, 2020, 6:23pm

I read the discussion title and thought it was related to human trafficking

wicked_knight · August 31, 2020, 6:57pm

Hi, I didn’t have much time today so couldn’t really spend time on the contest unfortunately : /

Could you tell me why this gives TLE?

import numpy as np 
SIZE = 26
def binomialCoeff(n, k): 
	res = 1
	if (k > n - k): k = n - k  
	for i in range(k): res *= (n - i);res = int(res/(i + 1)) 
	return res
def countSubsequences(str1, str2):
	freq1 = np.zeros(26, dtype = np.int);freq2 = np.zeros(26, dtype = np.int) 
	n1 = len(str1);n2 = len(str2)
	for i in range(n1): freq1[ord(str1[i]) - ord('a') ] += 1
	for i in range(n2): freq2[ord(str2[i]) - ord('a')] += 1
	count = 1
	for i in range(SIZE):
		if (freq2[i] != 0):
			if (freq2[i] <= freq1[i]): count = count * binomialCoeff(freq1[i],freq2[i])
			else: return 0
	return count
for _ in range(int(input())):
	S = input();ans = {}
	for _ in range(int(input())):
		name = input()
		print(countSubsequences(S,name))
		#ans.add(name,countSubsequences(S,name))
		ans[name] = countSubsequences(S,name)
	if(max(ans.values()) > 0): print(max(ans,key = ans.get))
	else: print("No Research This Month") #sample testcases passed, TLE

wicked_knight · August 31, 2020, 6:59pm

def past(arr,W,i):arr[i - 1] += W;return arr
def present(arr,L,R): print(sum(arr[L - 1:R]))
N = int(input()); arr = list(map(int , input().split()))
for _ in range(int(input())):
	input_ = list(map(str , input().split()))
	if(input_[0] == "past"): past(arr,int(input_[1]),int(input_[2]))
	else: present(arr,int(input_[1]),int(input_[2])) #sample testcases passed, TLE

even this code got TLE for the ‘dark’ problem

mayureshpatle · August 31, 2020, 7:09pm

In this function, the calculations are being done on large numbers, which is taking considerable amount of time. Hence, resulting in TLE.
You’ll have to use modular arithmetic. Please read the editorial.

wicked_knight · August 31, 2020, 7:12pm

okay thank you

mayureshpatle · August 31, 2020, 7:14pm

For Dark, the expected time complexity was O(Q*LogN). Editorial for Dark is available here.

mayureshpatle · August 31, 2020, 7:16pm

lol

anon80320319 · September 1, 2020, 3:06am

Can anybody help me why my code is wrong?
Link : CodeChef: Practical coding for everyone
I have tried to follow the same logic as editorial but can’t seem to find my mistake

sb_kmb · September 1, 2020, 3:15am

editorial for CodeChef: Practical coding for everyone ?

mayureshpatle · September 1, 2020, 5:14am

@sb_kmb Editorial for Code Together (CMX1P01): CMX1P01 - Code Together - Code Mates x1 Editorial

nalingoyal · September 1, 2020, 10:03am

@mayureshpatle can you please point out the mistake in my code :
https://www.codechef.com/viewsolution/37343577

I have used the same C++ code given at geeksforgeeks with some changes :

may be i am doing mistake with applying the mod…please help…any help is greatly appreciated

mayureshpatle · September 1, 2020, 11:20am

@nalingoyal In function binomialCoefficient, the values of n and k are large, so there will be an overflow during calculation. Use the method mentioned in the editorial.

nalingoyal · September 1, 2020, 5:09pm

@mayureshpatle i did as u suggested…i checked for overflow in binomial cofficient function…but i didnt get AC …plz have a look.
https://www.codechef.com/viewsolution/37370795

mayureshpatle · September 1, 2020, 5:54pm

@nalingoyal \frac{A}{B} \mod M \neq \frac{A \mod M}{B \mod M}.

\frac{A}{B} \mod M = ((A \mod M)*B^{-1}) \mod M, here B^{-1} is the modular multiplicative inverse of B under modulo M.

To use this, you will have to calculate \text{numerator} \mod M and \text{denominator} \mod M differently. Also, since M=998244353 is prime, you can use Fermat’s Little Theorem to calculate the modular multiplicative inverse of B under modulo M, which will be equal to B^{M-2} \mod M.

Hence, the result will be:

((\text{numerator} \mod M) * ((\text{denominator} \mod M)^{M-2}\mod M)) \mod M

Have a look at the ncr() function in Python 3 Solution provided in the editorial, it uses the same approach that you are trying to implement.

Also, you can read more about Modular multiplicative inverse here.

nalingoyal · September 1, 2020, 6:53pm

it is not resolved yet.plz help
lli power(lli x, unsigned long long int y, unsigned long long int mod)
{
if (y == `0)
return 1;
long long int p = power(x, y/2, mod) % mod;
p = (p * p) % mod;

return (y%2 == 0)? p : (x * p) % mod;

}
// Returns value of Binomial
// Coefficient C(n, k)
lli binomialCoeff(lli n, lli k)
{
lli res = 1,den=1;

// Since C(n, k) = C(n, n-k) 
if (k > n - k) 
	k = n - k; 

// Calculate value of 
// [n * (n-1) *---* (n-k+1)] / 
// [k * (k-1) *----* 1] 
for (lli i = 0; i < k; ++i) { 
	res *= (n - i);
            res %= mod;
	den*=(i+1);
	       den %=mod;
lli inv=power(den,mod-2,mod)%mod;
return (res*inv)%mod;
}

}

my solution: CodeChef: Practical coding for everyone

mayureshpatle · September 1, 2020, 7:03pm

@anon80320319 I had to spend a lot of time on debugging your code. And, finally found the bug. The problem is with this statement:

result = (result*(fact[x[i]]*binaryExp((fact[c[i]]*fact[x[i]-c[i]])%mod,mod-2))%mod)%mod;

Let’s make it look a little simpler,

Let, a = fact[x[i]]
b = (fact[c[i]]*fact[x[i]-c[i]])%mod
c = binaryExp(b, mod-1)

Now, put these symbols in your statement

result = (result * (a * c) % mod)% mod;

Now you can guess the error.

result * (a * c) % mod has same execution sequence as (result * a * c) % mod, and each of result, a & c can have values upto 998244353, hence, this will result in long long limit overflow.

The correct statement should be of the following form:

result = (result * ( (a * c) % mod ) )% mod;

Just, replace the statement, with the following one, and your code will be accepted.

result = (result*((fact[x[i]]*binaryExp((fact[c[i]]*fact[x[i]-c[i]])%mod,mod-2))%mod))%mod;

You see, the problem was just because of a pair of parentheses, and it was quite tough to spot the error, so I would suggest, rather than writing a large expression in a single statement, you should always evaluate it by dividing into simpler forms. This will make your code more readable and easy to debug.

mayureshpatle · September 1, 2020, 7:20pm

You’re violating this condition

print the name having maximum score, if there are more than one such names, then print the one which appeared first in the list.

P.S. There’s one more error in your code,

lli inv=power(den,mod-2,mod)%mod;
return (res*inv)%mod;

These two statements should be placed after the closing bracket, i.e outside the loop’s body, you’ve written them inside the loop’s body.

nalingoyal · September 2, 2020, 4:45am

thanks man… i got my mistake…actually i thought Unorderd_map keeps the thing in the order they are inserted…but that was not correct…thanks for helping me…
people like you are the reason why i like codechef discussion forum so much

anon80320319 · September 2, 2020, 11:54am

Thank you very much