SUBSFREQ - Editorial

PROBLEM LINK:

Division 1
Division 2

Author: Temirlan Baibolov
Tester: Данило Мочернюк
Editorialist: Ritesh Gupta

DIFFICULTY:

Medium

PREREQUISITES:

Dynamic Programming, Greedy, Observation

PROBLEM:

You are given a sequence of integers A_1, A_2, \ldots, A_N. For each non-empty subsequence B of this sequence, the beauty will be the integer with the largest number of occurrences in that subsequence. If there are several options, choose the smallest one and note it down.

For each integer between 1 and N inclusive, find out how many subsequences have it as it answers. Since these numbers may be very large, compute them modulo 1,000,000,007 (10^9 + 7).

QUICK EXPLANATION:

  • We need to find an element from subsequences (based on the condition mentioned in the problem) which does not depend on the relative ordering of elements of the subsequence.

  • If given elements are pairwise distinct then we can say for any subsequence the beauty would be the minimum element present in that subsequence.

  • Let’s find the frequency (say F_x) of each x in A such that 1 <= x <= N. For a particular value of x, we find the count of such subsequences for which the beauty is x. If we compute this for each f_x such that 1 <= f_x <= F_x, then answer for x is the sum of all these beauties. This can be done by using a two-dimensional matrix, where rows represent the number of occurrences, and columns represent the unique integers in A. This approach is sufficient to pass the subtask 1.

  • To solve the problem for 100 points, we can compute the beauty greedily and use a simple inclusion-exclusion principle. For each i from 1 to N, we have to find all the elements of the given sequence A which have the frequency less than or equal to i. Then we can start computing the subsequences that have a maximum frequency equal to i, for each i from 1 to N.

  • Since in case of two or more elements with the same frequency the answer for that subsequence is the smaller element, we need to iterate over all the numbers in descending order.

EXPLANATION:

Subtask #2: As it is comparatively easier than subtask 1. We are given only distinct values. How can we use this information? So, if values are distinct then the beauty for any subsequence is just the minimum element of that subsequence. So, let’s sort the given sequence A in decreasing order and then for each i from 1 to N, compute how many subsequences have beauty equal to A_i.

  • For i = 1, the only valid subsequence is [A_1]
  • For i = 2, the valid subsequences are [A_2], [A_2, A_1]
  • For any i, the valid subsequences are 2^{i-1}

Subtask #1: We need to calculate the number of subsequences in which i is the smallest element that occurs the maximum number of times, for each i. Suppose a number x appears F_x times in the given sequence A then we can say the count of subsequences for which beauty equal to x is the sum of:

  • subsequences in which frequency of x is equal to 1
  • subsequences in which frequency of x is equal to 2
  • subsequences in which frequency of x is equal to F_x

Let’s solve any one of them, rest are solved similarly. So, let’s say we are calculating for some frequency y, then:

  • There are {F_x\choose y} ways to choose y times x from F_x.

  • If we are adding some number p in this subsequence then there are two cases:

    1. p < x then we can add at most (y-1) numbers of type p because if we add more than that then the beauty of this subsequence would become p. So, for each such p we need to find the ways in which we can select it. Let’s denote it by u_p.
    2. p > x then we can add at most y numbers of type p as if we add more than that then the beauty of this subsequence would become p (With same frequency, the answer is smallest one). So, for each such p we need to find the ways in which we can select it. Let’s denote it by v_p.
  • Now the answer is {F_x\choose y} * U_x * V_x.

U_x = product of u_p for all p such that p > x and u_p is the sum of the number of ways we can choose p at most (y-1) times out of F_p.

V_x = product of v_p for all p such that p < x and v_p is the sum of the number of ways we can choose p at most y times out of F_p.

Now, how to calculate this? We can create a 2D table T of dimension N*N. Sort the given sequence and compute the frequency array F where F_x represents the frequency of element x in the given sequence A. Any T_{i, j} of this 2D table represents the number of subsequences for which the answer is i(>= 0) and having the frequency j(>= 0).

T_{i, j} = {F_i\choose j} * U_i * V_i

NOTE: You can even notice that you only need the table size equal to (number of distinct elements in given sequence A) * ( the maximum frequency over all elements of A).

Subtask #3: We will not be able to create any 2D table because constraints are too high for that. So, how can we improve our solution? Let’s try to change the way we are calculating the answer. If we observe carefully then at any point of time, we only need (product of T_{p, j} for p > i) * (product of T_{p, j-1} for p < i) in O(1). How can we achieve it?

For each i from 1 to N, we first compute the elements which can have a frequency equal to i in any valid subsequence. Then we sort each of them in descending order (so that if there are two or more elements with the same frequency then the answer for that subsequence is the smaller element)

Suppose A = [1, 2, 1, 3, 1, 3] then

  • Numbers that have frequency = 1\{3, 2, 1\}
  • Numbers that have frequency = 2\{3, 1\}
  • Numbers that have frequency = 3\{1\}

So, now we are computing answers in order of frequency. What does that mean? It means that we first calculate the answer for all numbers which can have frequency i, for each i from 1 to N.

Now, since we are processing the elements in this specific order then at any point of time we only cover subsequences which either have less frequency than the frequency of the current element or if have the same frequency then the element which has that frequency is greater than the current element which we are processing. Hence, the answer for y-th frequency of element x is equal to the number of ways we can select y out of F_x multiplied by the number of subsequences which we already covered except the ones for which the answer is x.

Let totalCoveredSubsequences be the total number of subsequences for which we computed the answer. Then for removing subsequences for which the answer is x and has a frequency at most y-1 is equivalent to dividing the totalCoveredSubsequences/numberOfWays[x] where numberOfWays[x] is equal to the sum of {F_x\choose i} for all i from 0 to y-1.

TIME COMPLEXITY:

TIME: O(N)
SPACE: O(N)

SOLUTIONS:

Tester's Solution
#include <bits/stdc++.h>
#define ll long long
#define pb push_back
#define x first
#define y second    
#define mp make_pair
#define sz(a) ((int)(a.size()))
using namespace std;
const int mod = 1000 * 1000 * 1000 + 7;
const int INF = 1000 * 1000 * 1000;
const int MAX = 500005;
int cnt[MAX];
int cnt2[MAX];
int answer[MAX];
int fact[MAX];
int inv[MAX];
int prefC[MAX];
vector<int> byCnt[MAX];
int add(int a, int b)
{
    return a + b >= mod ? a + b - mod : a + b;
}
int sub(int a, int b)
{
    return a - b < 0 ? a - b + mod : a - b;
}
int mult(int a, int b)
{
    return a * (ll)b % mod;
}
int c(int n , int k)
{
    return mult(fact[n] , mult(inv[n - k] , inv[k]));
}
int modPow(int a, int step)
{
    int ans = 1;
    while(step)
    {
        if(step & 1)
            ans = mult(ans , a);
        a = mult(a , a);
        step >>= 1;
    }
    return ans;
}
int main() {
    
    ios_base::sync_with_stdio(0);
    fact[0] = inv[0] = 1;
    for(int i = 1; i < MAX; i++)
    {
        fact[i] = mult(fact[i - 1] , i);
        inv[i] = modPow(fact[i] , mod - 2);
    }
    
    int T;
    cin >> T;
    assert(1 <= T && T <= 100);
    int total = 0;
    for(int i = 1; i <= T; i++)
    {
        int n;
        cin >> n;
        for(int i = 1; i <= n; i++)
            prefC[i] = 1;
        assert(1 <= n && n <= 500000);
        total += n;
        assert(total <= 500000);
        vector<int> a(n);
        for(auto &x : a)
        {
            cin >> x;
            assert(1 <= x && x <= n);
            cnt[x]++;
            byCnt[cnt[x]].pb(x);
        }
        for(int i = 1; i <= n; i++)
        {
            sort(byCnt[i].begin() , byCnt[i].end());
            reverse(byCnt[i].begin() , byCnt[i].end());
        }
        ll all = 1;
        for(int i = 1; i <= n; i++)
            for(auto x : byCnt[i])
            {
                ll except = mult(all , modPow(prefC[x] , mod - 2) );
                cnt2[x]++;
                answer[x] = add(answer[x] , mult(except , c(cnt[x] , cnt2[x])));
                
                all = mult(all, modPow(prefC[x] , mod - 2));
                prefC[x] = add(prefC[x] , c(cnt[x] , cnt2[x]));
                all = mult(all, prefC[x]);
            
            }
        
        for(int i = 1; i <= n; i++)
            cout << answer[i] << " ";
        cout << endl;
        for(int i = 1; i <= n; i++)
        {
            answer[i] = cnt[i] = cnt2[i] = 0;
            byCnt[i].clear();
        }
    }
}
3 Likes

Hello there!
Actually I tried the approach for subtask 1 and subtask2, but it is showing TLE. Though there this is O(n^3) but still TLE. Can you pls look into my method and tell why TLE. it is passing all given test cases

https://www.codechef.com/viewsolution/37202658

thanks in advance

@rishup_nitdgp Can you pls tell what is the function modPow doing and what is prefC being used for?

1 Like

@codeverything modPow is used to compute mod inverse of number.
as modInv(x) = modPow(1x, mod - 2);
this is used for division in modular arithmetics. (a / b) % m = a * (modInv(b))

for any freq f, prefC[x] is storing number of ways selecting x at most f times.

1 Like

@vpinmrcool thanks, got it now.

What if we use permutations and combinations.

For example {1,2,3,4,5} are those number that are occurring 1 at least one time

Then beauty for 1= 4! +1 for 2 =3!+1

For {2,3} with frequency of two

2= (2+(number of elements with previous Freq) ) ! +2

If you are interested just reply I’ll explain more properly

1 Like