PASHA - Editorial

PROBLEM LINK:

Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4

Author: neerav_1
Testers: iceknight1093, rivalq
Editorialist: iceknight1093

DIFFICULTY:

TBD

PREREQUISITES:

Prefix sums, maps

PROBLEM:

You’re given a string A and an integer K.
A character c is called good with respect to a substring B of A if
2\cdot \text{freq}(B, c) - |B| = K

Find the sum of the count of good characters across all substrings.

EXPLANATION:

Instead of fixing a substring and counting the number of good characters it has, let’s fix a character and count the number of substrings it’s good for.

To this end, let’s analyze what it means for character c to be good for substring A[i\ldots j].

Analysis

The length of this substring is j-i+1, so we want
2\cdot \text{freq}(A[i\ldots j], c) - (j-i+1) = K.

Let \text{pref}_c[i] be the number of occurrences of c in A[1\ldots i].
We have \text{freq}(A[i\ldots j], c) = \text{pref}_c[j] - \text{pref}_c[i-1], so the above equation then reduces to

2\cdot(\text{pref}_c[j] - \text{pref}_c[i-1]) - (j-i+1) = K

which with a bit of rearrangement becomes

(2\cdot \text{pref}_c[j] - j) - K = 2\cdot\text{pref}_c[i-1] - (i-1)

Notice that the left side depends only on j, while the right side depends only on i.
We’ll use this fact to compute the answer.

For a fixed character c, we want to compute the number of pairs (i, j) such that the equation

(2\cdot \text{pref}_c[j] - j) - K = 2\cdot\text{pref}_c[i-1] - (i-1)

holds. Also, we need to ensure that c actually occurs in A[i\ldots j].

This can be done using a map, as follows:

  • Let \text{ct} be a map, initially empty.
    • \text{ct}[x] will denote the number of indices i so far such that 2\cdot\text{pref}_c[i-1]-(i-1) = x
  • Iterate j from 1 to N.
  • For a fixed j, the number of valid i \leq j is simply \text{ct}[(2\cdot \text{pref}_c[j] - j) - K], because that’s exactly what our map holds. So, add that value to the overall answer.
  • Now we need to update our map with index j, which corresponds to increasing \text{ct}[2\cdot \text{pref}_c[j] - j]. by 1.

This almost solves the problem: the only thing we haven’t done is account for the fact that c needs to occur in the range.
Notice that, in terms of \text{pref}_c, we essentially need to ensure that \text{pref}_c[i-1] \lt \text{pref}_c[j].

There are several ways to accomplish this.
One way is to delay updates to the map. That is,

  • Keep a list of updates that need to be performed, say L. Initially, L = [0], corresponding to 2\cdot \text{pref}_c[0] - 0 obtained from i = 0 (note that we’re treating the string as 1-indexed here, so index 0 is ‘before’ the string starts).
  • At index j, if S_j = c, perform all updates that are pending in L, and then clear L. This is because S_j = c means \text{pref}_c[j] is higher than any value before it.
  • Afterwards, append 2\cdot \text{pref}_c[j] - j to L as a pending update.

By delaying updates in this manner, we obtain exactly what we need: the number of substrings for which character c is good.
Notice that while we may perform many updates at once, their total number is still at most N, since we do at most one for each index.
For a fixed c, this takes \mathcal{O}(N) or \mathcal{O}(N\log N) time depending on the type of map used.

Doing this separately for each c from a to z gives us a solution in \mathcal{O}(26N\log N).

TIME COMPLEXITY

\mathcal{O}(26N) or \mathcal{O}(26N\log N) per test case.

CODE:

Setter's code (C++)
#include <bits/stdc++.h>
using namespace std;

long long int goodCharacters(int N, int K, string &s)
{
	long long int ans = 0;
	for(char i = 'a'; i <= 'z'; ++i)
    {
        unordered_map<int, int> mp;
        if(s[0]==i)
        {
        mp[0] = 1;
        }
        int temp = 0;
        int index = 0;
        while(index < N)
        {
            int j = index;
            int temp1 = temp;
            while(j < N && (j==index || s[j]!=i))
            {
                if(s[j]==i)
                {
                    temp1++;
                }
                else
                {
                    temp1--;
                }
                if(mp.find(temp1-K)!=mp.end())
                {
                ans+=mp[temp1-K];
                }
                j++;
            }
            j = index;
            while(j < N && (j==index || s[j]!=i))
            {
                if(s[j]==i)
                {
                    temp++;
                }
                else
                {
                    temp--;
                }
                mp[temp]++;
                j++;
            }
            if(j==N)
            {
                break;
            }
            if(mp[0] == 0)
            mp[0] = 1;
            index = j;
        }
    }
    return ans;
}

void solve()
{
    int n;
    cin>>n;
    assert(1<=n&&n<=1e5);
    int k;
    cin>>k;
    assert(-1e5<=k&&k<=1e5);
    string s;
    cin>>s;
    for(int i = 0; i < n; ++i)
    {
        assert('a'<=s[i]&&s[i]<='z');
    }
    long long int ans = goodCharacters(n, k, s);
    cout<<ans<<endl;
}

int main() {
    int t;
    cin>>t;
    assert(1<=t && t<=100);
    while(t--)
    {
    solve();
    }
}
Editorialist's code (Python)
import string
for _ in range(int(input())):
	n, k = map(int, input().split())
	s = input()
	ans = 0
	for c in string.ascii_lowercase:
		mp, pref = {}, 0
		to_insert = [1]
		for i in range(n):
			if s[i] == c:
				pref += 1
				for x in to_insert:
					if x in mp: mp[x] += 1
					else: mp[x] = 1
				to_insert.clear()

			if 2*pref - i - k in mp: ans += mp[2*pref - i - k]
			to_insert.append(2*pref - i)
	print(ans)
1 Like

hello @iceknight1093 ,
first of all thanks for these super editorials.

can you please summarize the setter’s solution especially temp and temp1 var
i know he/she followed the same idea but still not upto the mark.

The idea is the exact same, really.

temp1 in the code is the value of 2\cdot\text{pref}_c[j]-j. You can see that when you move from j to j+1, this quantity either increases by 1 or decreases by 1, depending on whether S_{j+1} equals c or not.
So instead of computing prefix sums independently, the setter’s code computes this value directly.
temp is also the exact same quantity.

It’s just a slight difference in implementation: I chose to delay updates by putting them all into a vector and performing everything at the same time when necessary; while the setter instead uses a 2-pointer-like approach.
You can see that the first while loop computes the answers for a bunch of indices, and then the second while loop updates the values for the exact same set of indices into the map.

helpful as always :wink:

@iceknight1093 why you are storing ‘1’ in the array used for storing update in beginning for each character ?

My code uses 0-indexing, so that corresponds to 2\cdot\text{pref}_c[-1] - (-1) = 0+1 = 1, which is needed to consider prefixes as subarrays.

If you use 1-indexing you’d store 0 instead.