UNIQUEDECODE - Editorial

iceknight1093 · January 24, 2024, 4:30pm

PROBLEM LINK:

Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4

Author: mamaleshrh
Tester: raysh_07
Editorialist: iceknight1093

DIFFICULTY:

TBD

PREREQUISITES:

None

PROBLEM:

You’re given a string S containing only 'a’s and 'b’s.
Find the number of its subsequences that are uniquely decodable.
A decoding of a string is done as follows: partition it into substrings “a”, “ab”, and “aba”, then:

“a” is replaced with 1
“ab” is replaced with 2
“aba” is replaced with 3

EXPLANATION:

First, let’s figure out what a uniquely decodable string looks like.
We want to partition it into substrings “a”, “ab”, and “aba”.
In particular:

The first character of the string can’t be ‘b’, since all three substrings start with ‘a’.
If two 'b’s are adjacent to each other, there’s no way to partition the string at all.

Now, let’s look at a string and see how we can partition it, and what conditions allow us to do so uniquely.
Look at the last character of the string.

If the last character is ‘b’, we have no choice: the only valid substring ending with ‘b’ is “ab”, so we must remove the last two characters using it, and partition the remaining string.
If the last character is ‘a’, there are two possibilities: we can use either “a”, or “aba”.
- “a” can always be used, i.e, cutting out the last character.
- “aba” can be used only when the last three characters are, as it says, “aba”.
  Notice that if this is the case though, the string isn’t uniquely decodable! Instead of using “aba”, we can use “ab” and “a” to get a different decomposition using the same characters.

Together, this tells us that a string is uniquely decomposable if and only if, when partitioning from the back, we never get the chance to use “aba” - meaning our moves are forced (if the last character is ‘b’, use “ab”; otherwise use “a”).
It’s not hard to see that the only strings satisfying this are those of the form

aaaa\ldots aaababab\ldots abab

That is, several occurrences of “a” followed by several occurrences of “ab”.

Now, we need to count the number of such subsequences of S.
One way of doing that is to maintain the count of a few different types of subsequences, and try extending them when moving left to right.
Let:

\text{ans} denote the answer.
c_a denote the number of 'a’s seen so far.
c_{ab} denote the number of subsequences of the form aaa\ldots aabab\ldots ab that end with ‘b’.
c_{ba} denote the number of subsequences of the form aaa\ldots aabab\ldots aba that end with ‘a’.

Initially, all four of these values are 0.
Now, for each i from 1 to N, let’s try to figure out how many new subsequences of each type will be created using index i.

If S_i = b, only c_{ab} will change.
Specifically, c_{ab} \to c_{ab} + c_a + c_{ba}, because a subsequence of that form can be created by appending b to any subsequence ending with ‘a’.
If S_i = a, c_a and c_{ba} will change.
Specifically,
- c_a \to c_a + c_a + 1, because we can either append this index to any of the old subsequences ending with ‘a’, or take it alone.
- c_{ba} \to c_{ba} + c_{ab}, because the only way to create a new subsequence of this form is to append ‘a’ to one that ends with ‘b’.

The final answer is c_a + c_{ab}.
Make sure to do all additions under modulo.

TIME COMPLEXITY:

\mathcal{O}(N) per testcase.

CODE:

Author's code (C++)

#include <bits/stdc++.h>
// #include<ext/pb_ds/assoc_container.hpp>
// #include<ext/pb_ds/tree_policy.hpp>
// using namespace __gnu_pbds;
//#define oset tree<int, null_type,less<int>, rb_tree_tag,tree_order_statistics_node_update>//oset name name.order_of_key(#ele<k) *name.find_by_order(index) less_equal greater greater_equal

#define vi vector<int>
#define pii pair<int,int>
#define mii map<int,int>
#define rep(i, a, b) for (int i = a; i < b; i++)
#define int long long
#define ld long double
#define pb push_back
#define all(v) v.begin(), v.end()
#define back(v) v.rbegin(), v.rend()
#define yes cout << "YES" <<"\n";
#define no cout << "NO" <<"\n";
#define deb(a) cerr<< #a << "=" << (a)<<"\n";
#define nl "\n"
#define FastIO            \
    ios_base::sync_with_stdio(0); \
    cin.tie(0);                   \
    cout.tie(0)
using namespace std;
#define mod 1000000007
const int oo = 1e18;
void solve()
{
    string s;cin>>s;
    int ab_end_a=0;
    int ab_end_b=0;
    int a_end_a=0;
    for(int i=0;i<s.size();i++){
        if(s[i]=='a'){
            ab_end_a=(ab_end_a + ab_end_b)%mod;
            a_end_a=(2*a_end_a+1)%mod;
        }else{
            ab_end_b=(ab_end_b + ab_end_a + a_end_a)%mod;
        }
    }
    cout<<(a_end_a+ ab_end_b)%mod<<nl;

}
signed main()
{
    FastIO;
    // freopen("6.in", "r", stdin);
    // freopen("6.out", "w", stdout);
    // int test=1;
    int test;
    cin >> test;
    while (test--)
    {
        solve();
    }
    return 0;
}

Tester's code (C++)

#include <bits/stdc++.h>
using namespace std;
#define int long long
#define INF (int)1e18
#define f first
#define s second

mt19937_64 RNG(chrono::steady_clock::now().time_since_epoch().count());

const int mod = 1e9 + 7;

void Solve() 
{
    string s; cin >> s;
    
    int dp[2][2];
    dp[0][0] = dp[0][1] = dp[1][0] = dp[1][1] = 0;
    // dp[0][0] --> ending at a / b, alternating phase or not 
    // b starts alternating phase 
    for (auto x : s){
        if (x == 'a'){
            dp[0][0] += 1 + dp[0][0];
            dp[0][1] += dp[1][1];
            dp[0][0] %= mod;
            dp[0][1] %= mod;
        } else {
            dp[1][1] += dp[0][0] + dp[0][1];
            dp[1][1] %= mod;
        }
    }
    
    cout << (dp[0][0] + dp[1][0] + dp[1][1]) % mod << "\n";
}

int32_t main() 
{
    auto begin = std::chrono::high_resolution_clock::now();
    ios_base::sync_with_stdio(0);
    cin.tie(0);
    int t = 1;
    // freopen("in",  "r", stdin);
    // freopen("out", "w", stdout);
    
    cin >> t;
    for(int i = 1; i <= t; i++) 
    {
        //cout << "Case #" << i << ": ";
        Solve();
    }
    auto end = std::chrono::high_resolution_clock::now();
    auto elapsed = std::chrono::duration_cast<std::chrono::nanoseconds>(end - begin);
    cerr << "Time measured: " << elapsed.count() * 1e-9 << " seconds.\n"; 
    return 0;
}

Editorialist's code (Python)

mod = 10**9 + 7
for _ in range(int(input())):
    s = input()
    ans, only_a = 0, 1
    alt_a, alt_b = 0, 0
    for c in s:
        if c == 'a':
            ans = (ans + only_a) % mod
            alt_a = (alt_a + alt_b) % mod
            only_a = (only_a * 2) % mod
        else:
            ans = (ans + alt_a + only_a - 1) % mod
            alt_b = (alt_b + alt_a + only_a - 1) % mod
    print(ans % mod)

prabhuomkar9 · January 24, 2024, 4:37pm

HELP

Can anyone take a look at my code and explain me the bug, i know this is the most naive code i could come up with but i just want to know which edge case i missed. I tried to visualize the recursion with Recursion Tree Visualizer and found out the tree stops growing when i dont include the first character and continue, I dont know what is the bug tho as if i skip any other character the tree does grow till end.

t = int(input())

while t > 0:
    t -= 1
    s = input()

    count = [0]

    def dfs(i, curr, isAB):
        if i >= len(s):
            count[0] += 1 if curr else 0
            return

        if isAB:
            dfs(i + 1, curr, isAB)

            if curr and curr[-1] == "a" and s[i] == "b":
                dfs(i + 1, curr + "b", isAB)
            elif curr and curr[-1] == "b" and s[i] == "a":
                dfs(i + 1, curr + "a", isAB)
        else:
            if s[i] == "a":
                dfs(i + 1, curr, isAB)
                dfs(i + 1, curr + "a", isAB)
            if curr and s[i] == "b":
                dfs(i + 1, curr, isAB)
                dfs(i + 1, curr + "b", not isAB)

    dfs(0, "", False)

    print(count[0] % (10 ** 9 + 7))```