GDSUB - Editorial

anand20 · September 11, 2019, 1:36pm

PROBLEM LINKS :

Contest : Division 1

Contest : Division 2

Practice

Setter : Naman Bansal

Tester : Istvan Nagy

Editorialist : Anand Jaisingh

DIFFICULTY :

Easy

PREREQUISITES :

Basic Dynamic Programming

PROBLEM :

You are given a sequence of integers of length N, consisting of \approx 1000 distinct values. Now, you need to find the number of subsequence of length K of this sequence, such that each pair of elements in the selected subsequence is pair-wise distinct.

QUICK EXPLANATION :

A quick look at the constraints, and we can prove that the given array A[] can consist of at max 1007 distinct values. So, K shall always be \le1007 . Then, we can group numbers with equal values together. Then, the task comes down to finding the number of ways to select K numbers, such that we pick at most 1 number from each group. To accomplish this, we can do some dynamic programming, which runs in approx O(X^2) time, where X \approx 1000

EXPLANATION :

Let’s begin by having a look at the constraints. It states that each A_i \le 8000 , and that the numbers A_i are prime. Now, we know that prime numbers occur less frequently than their composite counter parts.

In fact, the number of prime numbers \le 8000 is exactly 1007 . So, we know that the array given in the input shall also consist of at max 1007 distinct values of A_i. We initially needed to find the number of subsequences consisting of at most K distinct values, where K \le 100,000.

But since the maximum number of distinct values in any subsequence of this array is always \le 1007, so we can take K=min(K,1007). Now, we need to solve the original problem using these modified constraints.

Another observation we can make is that the ordering of elements in a subsequence is not important. For example, if we pick some valid subsequence a_{k_1},a_{k_2}...a_{k_z} , \hspace{0.2cm} z \le K , a_i \ne a_j if k_i \ne k_j , and k_1 < k_2 ... < k_ z , then in what ever order we arrange the elements a_{k_i} , it does not make a difference, they do not break the pairwise distinctness property.

So, the original problem now boils down to finding the number of subsets \{ k_1,k_2...k_z \} \hspace{0.2cm} z \le K , of the set \{1,2,...N \} , such that a[k_i] \ne a[k_j] if k_i \ne k_j

We can do a dynamic programming for the following. Let the number of distinct values appearing in the array be ctr. Then we can replace every distinct value appearing in the array by an integer in the range [1,ctr] and our answer remains unchanged.

Also, for each distinct value, let’s maintain a variable cnt[i] that states the number of times i appears in the given array a. Then we can do the following dynamic programming, that is in a way extremely similar to how to construct pascal triangle for binomial coefficients but with a small change :

dp[i][0]=1 \hspace{0.2cm} \forall \hspace{0.2cm} i \ge 0

dp[i][j] = dp[i-1][j]+ dp[i-1][j-1] \cdot cnt[i]

The number we are looking for is \sum_{i=0}^{K} dp[ctr][i]

Here, dp[i][j] indicates the number of valid subsets of size j we can pick considering only array values \le i

For a fixed (i,j) , the term dp[i-1][j] considers the valid subsets using array values < i and the second one ensures we pick a single instance of i, and add it to valid subsets of size j-1 consisting of array values < i .

Since we have cnt[i] instances of element i in the array, so we can pick exactly one among them in cnt[i] ways.

On a side note :

Assume we want to solve this problem without the additional constraint of A_i \le 8000 and A_i being prime. Then, N and K can be as large 10^5. The answer is then given by :

[x^k] \prod_{i=1}^{ctr} (1+ cnt[i] \cdot x)

That is, the answer is given by the coefficient of [x^k] in the polynomial :

(1+cnt[1] \cdot x) \cdot (1+ cnt[2] \cdot x) \cdot ......\cdot (1+cnt[ctr] \cdot x)

We can multiply all these smaller polynomials in O(N \cdot log^2 N ) time using high precision FFT Modulo 10^9+7

That’s it, Thank you !

Your comments are welcome !

COMPLEXITY ANALYSIS :

Time Complexity : O( X^2) , where X \approx 1000

Space Complexity : O( X^2) , where X \approx 1000

SOLUTION LINKS :

Setter

#include<bits/stdc++.h>
using namespace std;
 
#define modulo 1000000007
long long dp[1008][1008];
 
int main()  {
    ios_base::sync_with_stdio(0);
    cin.tie(0);cout.tie(0);
    int n, k;
    cin >> n >> k;
    int x;
    map<int, int> m;
    for(int i = 0; i < n; i++)  {
        cin >> x;
        m[x]++;
    } 
    vector<long long> v;
    for(auto i = m.begin(); i != m.end(); i++)  {
        v.push_back(i->second);
    }
    for(int i = 0; i < v.size(); i++)   {
        for(int j = 1; j <= v.size(); j++)  {
            dp[i][j] = 0;
        }
    }
    for(int i = 0; i < v.size(); i++)   {
        dp[i][0] = 1;
    }
    dp[0][1] = v[0];
    for(int i = 1; i < v.size(); i++)   {
        for(int j = 1; j <= v.size(); j++)  {
            dp[i][j] += dp[i - 1][j];
            dp[i][j] = dp[i][j] % modulo;
            dp[i][j] += (dp[i - 1][j - 1]) * v[i];
            dp[i][j] = dp[i][j] % modulo;
        }
    }
    long long ans[v.size() + 1];
    ans[0] = 1;
    for(int i = 1; i <= v.size(); i++)   {
        ans[i] = ans[i - 1] + dp[v.size() - 1][i];
        ans[i] = ans[i] % modulo;
    }
    /*for(int i = 0; i < v.size(); i++)   {
        for(int j = 0; j <= v.size(); j++)  {
            cout << dp[i][j] << " ";
        }
        cout << "\n";
    }
    for(int i = 0; i <= v.size(); i++)  {
        cout << ans[i] << " ";
    }
    cout << "\n";*/
    if(k <= v.size())   {
        cout << ans[k];
    }
    else    {
        cout << ans[v.size()];
    }
    return 0;
}

Tester

#include <cstdio>
#include <vector>
#include <bits/stdc++.h>
 
using namespace std;
 
int main()
{
	const int MAXA = 8002;
	const int MOD = 1e9 + 7;
	int N, K, tmp, sum = 0;
	scanf("%d %d", &N, &K);
	vector<int> A(MAXA), dp(MAXA);
	dp[0] = 1;
 
	for (int i = 0; i < N; ++i)
	{
		scanf("%d", &tmp);
		A[tmp]++;
	}
 
	for (int i = 1; i < MAXA; ++i)
	{
		for (int j = i; j > 0; --j)
		{
			dp[j] = (dp[j] + static_cast<int64_t>(dp[j - 1]) * A[i]) % MOD;
		}
	}
 
	for (int i = 0; i <= K && i < MAXA; ++i)
	{
		sum = (sum + dp[i]) % MOD;
	}
	printf("%d\n", sum);
	return 0;
}

Editorialist

//#pragma GCC optimize("Ofast,no-stack-protector")
//#pragma GCC target("sse,sse2,sse3,ssse3,sse4,popcnt,abm,mmx")
//#pragma GCC target("avx,tune=native")
// Anand Jaisingh
 
#include<bits/stdc++.h>
 
using namespace std;
 
typedef complex<double> base;
typedef long double ld;
typedef long long ll;
 
#define pb push_back
#define pii pair<int,int>
#define pll pair< ll , ll >
#define vi vector<int>
#define vvi vector< vi >
const int maxn=(int)(2e5+5);
const ll mod=(ll)(1e9+7);
bool pr[maxn];
int a[maxn],cnt[maxn];
int dp[1100][1100];
 
int add(int a,int b)
{
    int ret=a+b;
 
    if(ret>=mod)
    {
        ret-=mod;
    }
 
    return ret;
}
 
int mul(int a,int b)
{
    ll ret=a*1ll*b;
 
    if(ret>=mod)
    {
        ret%=mod;
    }
 
    return (int)ret;
}
 
int main()
{
    ios_base::sync_with_stdio(0);cin.tie(0);
 
   int n,k;cin>>n>>k;
 
   for(int i=0;i<n;i++)
   {
       cin>>a[i];
 
       cnt[a[i]]++;
   }
 
   vector< int > v;v.pb(-1);
 
   for(int i=0;i<8000;i++)
   {
       if(cnt[i]>0)
       {
           v.pb(cnt[i]);
       }
   }
 
   assert(v.size()<=1010);
 
   k=min(k,(int)v.size());
 
   dp[0][0]=1;
 
   for(int i=1;i<v.size();i++)
   {
       dp[i][0]=1;
 
       for(int j=1;j<=k;j++)
       {
           dp[i][j]=add(dp[i-1][j],mul(dp[i-1][j-1],v[i]));
       }
   }
 
   int ret=0;
 
   for(int i=0;i<=k;i++)
   {
       ret=add(ret,dp[v.size()-1][i]);
   }
 
   cout<<ret<<endl;
 
 
    return 0;
}

g0ku007 · September 16, 2019, 6:45pm

I am curious if there exists any combinatorics formula for it? We can iterate over [1…k] and keep adding the count of good subsequences of size “i” in range(1…k). Since k has to be max 1007.

Thank you!

anand20 · September 16, 2019, 6:48pm

I think the combinatorics way of going for this problem is via FFT ( in the last part of the editorial ), and the slower DP solution, is the intended one for this problem.

g0ku007 · September 16, 2019, 6:52pm

I was breaking my head all days finding some combinatorics formula. I did find FFT, which is stated in the editorial, but i am yet to learn that concept. Anyways many thanks for the reply and editorial too!

kumaran_14 · September 16, 2019, 8:13pm

Could you provide resources for learning fft and similar problems along with it too?

teitus · September 17, 2019, 1:30am

Worth mentioning Tester’s solution - which requires O(X) space. // Note A[i] can be 0… Editorialist’s removed empty.

Bottom up DP, where DP[i] is the SeqsOfLen[i]. And you cycle through all the ‘groups’ of numbers with the same values (see Editorialist’s v). Given a seq of length i, if you have N x’s, you can create N seqs of length i + 1 by using each of them (remember numbers have the same value -x-, but are counted as different).

Return the sum of all lengths.

quite untidy code, but small enough: (note I shadowed n variable in the countSeq function - which is just the group’s size - not clean ) CodeChef: Practical coding for everyone

akkivarma · September 17, 2019, 11:31am

I used the vieta’s formula to solve this problem. You can find more info here Vieta’s formulas

kunalcodr · September 17, 2019, 12:02pm

That explanation is not very clear to me .please , can you explain in easy way .

yashwant_singh · September 17, 2019, 12:54pm

nice approach bro
can you elaborate how did you applied it to to the question.

new_trans · September 17, 2019, 1:19pm

Any resources for “high precision FFT, using modulo 1e9 + 7”?

adsk3003 · September 17, 2019, 2:10pm

Link to your solution ?

akkivarma · September 17, 2019, 2:37pm

Here is my solution Solution link
Credits: Vieta’s formula implementation

anand20 · September 17, 2019, 4:42pm

You can read it in this doc : fft_eng.pdf - Google Drive

akshatbhutra · September 17, 2019, 5:20pm

i used newton’s identities for symettric polynomial

archithadge · September 17, 2019, 6:16pm

Basically if x1,x2,x3…xn are no. of appearances of a1,a2,a3…an
it’s a SOP i.e 1+(x1+x2+x3…xn)+(x1x2+x2x3…) upto k terms taken together.
Right!

allenjv123 · September 18, 2019, 11:07am

Aren’t you doing the same thing as the DP formula but in a different way?
If I am wrong can you please give a formal derivation on how you applied the vieta’s formula to the given problem as it is difficult to understand from the code, how you approached the problem.

anon93552178 · September 18, 2019, 12:10pm

@allenjv123
Here’s my approach using Polynomials.

Editorial

allenjv123 · September 19, 2019, 4:57pm

Can someone tell me how we deduce this dp formula with a proof?

dp[i][j]=dp[i−1][j]+dp[i−1][j−1]⋅cnt[i]

Please do help, I am unable to sleep properly due to this!

dardev · September 20, 2019, 2:22am

I am demonstrating the DP in this post. Consider this example: 3 3 2 7 2 5 5.

It can be rewritten as 2 2 3 3 5 5 7. That is (2 repeated 2 times) (3 repeated 2 times) (5 repeated 2 times) and (7 repeated once).

In this case, the minimum size of the set can be 0 (Empty set). Maximum size can only be 4 (irrespective of the value of k).

So if k is 0: Number of possible sets is 1.
What if k is 1: We can either select 2 or 3 or 5 or 7 as our sole element. There are 2 ways to select 2, 2 ways to select 3, 2 ways to select 5, and 1 way to select 7. The total number of ways is 7. Here is how things look like when k is 1:

|2|3|5|7|
|2|2|2|1|

What if k is 2? Here is where things get interesting since recurrence is established.

Let’s consider the first one element (2), is it possible to generate a set of two elements with it? Ans: No. {2,2} is an invalid set. Here is what the table looks like now:

|2|3|5|7|
|2|2|2|1|
|0|

What if we consider the first two elements (2,3). Can we construct a set with 2 elements? Yes! How do we do that?
Step 1: Take all sets with one element {2}. There are “Two such sets”. Refer to Second Row, First Column.
Step 2: To all the above sets, add a {3}. How many ways to do it? There are 2 “3” in the input. So there are two ways to do it.
Step 3: Thus, there are 2 starting sets from step 1, and we can add 3 in 2 ways from step 2. Total possible sets with 2 elements 2,3 are 4. Recurrence will become more clear as we proceed with this exercise

Here is what the DP table looks like:
|2|3|5|7|
|2|2|2|1|
|0|4|

Next, is it possible to construct a two-element set with 2,3,5? (This set MUST contain a five or else it becomes a repetition of the previous case) Yes! In how many ways?
Step 1: Select either a 2 or a 3. This can be done in 2+2 ways. These 2+2 ways can be obtained as the sum of the first and the second column in the first row.
Step 2: Select a 5. This can be done in 2 ways.
Step 3: Obtain total possible sets as the product of step 1 and 2. That is 4*2 = 8 ways.
Here is what the table looks like now:
|2|3|5|7|
|2|2|2|1|
|0|4|8|

Now to the final step with k = 2. We are to construct a two-element set with 2,3,5,7 such that it necessarily contains 7.
Step 1: There are 2+2+2 ways (6 ways) to select one element from 2,3,5
Step 2: There is one way to select the 7.
Step 3: Total sets = 6*1 = 6

Our table looks like
|2|3|5|7|
|2|2|2|1|
|0|4|8|6|

Let’s take a pause here.

There is exactly one way to generate a null set.
There are 2+2+2+1 ways to generate a set with one element. 7 ways.
There are 0+4+8+6 ways to generate a set with two elements. 18 ways.

So if k = 2. If the maximum number of elements was 2, we could have generated 1+7+18 sets. 26 sets!

What if k was 3?
Our table looks like
|2|3|5|7|
|2|2|2|1|
|0|4|8|6|
|0|0|8|12|

Explanation: We cannot generate three-element sets from just 2 and 3. But if we are given 2, 3, 5 - we can take two-element set containing 2 and 3 (total 4 in number from row 3 column 2) and multiply it with the number of possible ways to introduce 5 (2 ways). This becomes 4*2 = 8.
Similarly, we can generate three-element sets with 2,3,5,7 that necessarily contains 7 in (4+8)*1 ways = 12 ways.

So if k was 3 we can generate an additional 20 sets. Yielding a total of (26+20) sets. 46 sets.

What if k was 4 or greater. In that case, we can generate additional sets containing all elements.

Our table looks like
|2|3|5|7|
|2|2|2|1|
|0|4|8|6|
|0|0|8|12|
|0|0|0|8|

If we are to find total elements. Add every call value except the first row and then add one to it to account for the null set. The answer becomes 53+1 = 54.

Let us get rid of the first row in our DP and then formulate the principle:

|2|2|2|1|
|0|4|8|6|
|0|0|8|12|
|0|0|0|8|

Here is the final DP formula:

First row to contain frequency of elements. 2, 2, 2, 1 in this case.
Starting second row, For any cell Cij, add all elements in previous row (i-1 row) until j-1 column and multiply it with C0j

Let me know if it’s not clear.
Here is the implementation: CodeChef: Practical coding for everyone

mohitmm · September 21, 2019, 11:37am

yes, this is the main observation!