PROBLEM LINK:
Editorialist: Rohit Mazumder
PREREQUISITES:
Basic Dynamic Programming
PROBLEM:
A subsequence of a word can be formed by dropping some of its letters and keeping the relative order of the rest of the letters same. Given a word, and an integer k, you have to find the number of distinct subsequences of length k you can form out of the word.
QUICK EXPLANATION:
We start our solution with the help of the small hint provided in the question itself: “Observe that the number of subsequences of length k of abcbbcaacaab that end in a ‘c’ is the same as the number of subsequences of length k−1 of abcbbcaa”. Notice that for every distinct subsequence of length k-1 formed from the portion of the string marked in yellow (Fig. 1), if we add a ‘c’ at end of them, then we will get another set of distinct subsequences of length k, each ending with the letter ‘c’! Using this observation we will now build up our solution.
EXPLANATION:
Let DP[i, j] be the number of distinct subsequences of length i in the substring formed by the first j characters of the word.
Let’s take the example of the word ‘banana’. We will find the number of distinct subsequences of length 3 in the substring ‘banana’.
The distinct subsequence of length 2 in ‘banan’ : {ba , bn, an, aa, na, nn}.
Hence DP[2,5] = 6
On appending an ‘a’ at the end of these we get a set of newly formed subsequences of length 3 in ‘banana’ :{baa, bna, ana, aaa, naa, nna}
The distinct subsequence of length 3 already present in ‘banan’ : {ban, baa, bna, bnn, ana, aan, ann, nan}
Hence, DP[3,5] = 8
We find that the subsequences {baa, bna, ana} is being repeated in both the newly formed set and the existing set. Hence we need to subtract the number of subsequences in the existing set ending with ‘a’. So, DP[3,6]=6+8-3=11.
Hence, we find that simply appending the ith character of the string to the previous subsequences might violate the condition that the counted subsequences are distinct. We also note that the only subsequence we overcount are those (of length k-1) in which we had appended the previous ith character.
DP[i-1,p-1] has the number of subsequences of length i-1, formed with the first (p-1) characters of the string. So if we append the pth character of the string to these, we get the number of subsequences of length i, each ending with the pth character of the string.
So to subtract the overcounted subsequences we need to simply subtract DP[i-1, p-1] where p is the last occurence of the character.
The final recurrence for the problem is,
DP[i, j] = DP[i-1, j-1] + DP[i, j-1] – DP[i-1,p-1] where p = previous occurrence of jth character.
where DP[i-1, j-1] = Number newly formed subsequences of length i formed by appending the jth character to the subsequences of length i-1 in the section [1:j-1] of the string.
DP[i,j-1] = Number of existing subsequences of length i in the section [1:j-1] of the string
DP[i-1, p-1] = Number of subsequences of length i overcounted, which are ending with the jth character of the string. Here p is the previous occurence of the jth character.
Solving the given example:-
We maintain a table to easily compute value of DP[i,j]
Number of distinct subsequence of length 3 in ‘banana’
Step 1: Fill the first row with 1’s as the number of distinct subsequence of length ‘0’ always 1. Fill the first column with 0’s (except the first element) as the number of subsequences of non-zero length in an empty string is 0
0 (_) | 1 (b) | 2 (a) | 3 (n) | 4 (a) | 5 (n) | 6 (a) | |
---|---|---|---|---|---|---|---|
0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
1 | 0 | ||||||
2 | 0 | ||||||
3 | 0 |
Step 2: Fill the rest of column according to the recurrence
DP[i, j] = DP[i-1, j-1] + DP[i, j-1] – DP[i-1, p -1 ].
Eg: As shown in the table, DP[3,6] = DP[2,5]+DP[3,5]-DP[2,3] = 6 + 8 -3 = 11.
0 (_) | 1 (b) | 2 (a) | 3 (n) | 4 (a) | 5 (n) | 6 (a) | |
---|---|---|---|---|---|---|---|
0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
1 | 0 | 1 | 2 | 3 | 3 | 3 | 3 |
2 | 0 | 0 | 1 | 3 | 5 | 6 | 6 |
3 | 0 | 0 | 0 | 1 | 4 | 8 | 11 |
Subtask 1:
- | t | i | n | n | i | t | u | s | |
---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
1 | 0 | 1 | 2 | 3 | 3 | 3 | 3 | 4 | 5 |
2 | 0 | 0 | 1 | 3 | 4 | 6 | 9 | 12 | 16 |
3 | 0 | 0 | 0 | 1 | 3 | 7 | 13 | 22 | 34 |
Answer : 34
Subtask 2:
- | g | o | b | b | l | e | d | y | g | o | o | k | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
1 | 0 | 1 | 2 | 3 | 3 | 4 | 5 | 6 | 7 | 7 | 7 | 7 | 8 |
2 | 0 | 0 | 1 | 3 | 4 | 7 | 11 | 16 | 22 | 29 | 35 | 35 | 42 |
3 | 0 | 0 | 0 | 1 | 3 | 7 | 14 | 25 | 41 | 63 | 92 | 98 | 133 |
4 | 0 | 0 | 0 | 0 | 1 | 4 | 11 | 25 | 50 | 91 | 154 | 183 | 281 |
Answer: 281
Subtask 3:
- | g | a | r | g | a | n | t | u | a | n | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
1 | 0 | 1 | 2 | 3 | 3 | 3 | 4 | 5 | 6 | 6 | 6 |
2 | 0 | 0 | 1 | 3 | 6 | 8 | 11 | 15 | 20 | 23 | 26 |
3 | 0 | 0 | 0 | 1 | 4 | 10 | 18 | 29 | 44 | 58 | 73 |
4 | 0 | 0 | 0 | 0 | 1 | 5 | 15 | 33 | 62 | 102 | 150 |
5 | 0 | 0 | 0 | 0 | 0 | 1 | 6 | 21 | 54 | 115 | 212 |
Answer: 212