COUNTRBS - Editorial

PROBLEM LINK:

Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4

Author: das_sayantan
Tester: kaori1
Editorialist: iceknight1093

DIFFICULTY:

TBD

PREREQUISITES:

Tree hashing

PROBLEM:

A regular bracket sequence is called good if, when its first and last characters are deleted, the remaining characters still form a regular bracket sequence.

You’re given a good RBS, S. Count the number of pairs of disjoint substrings of S that equal each other, and are also good regular bracket sequences.

EXPLANATION:

While this seems like a string problem, it can be converted into a tree problem!

Specifically, it’s possible to map any regular bracket sequence (henceforth RBS) to a tree as follows:

  • Initially, there’s only a single vertex, the root.
  • Then, for each i from 1 to N,
    • If S_i = \text{'('}, create a new child of the current vertex, and move to it.
    • If S_i = \text{')'}, move up to the parent of the current vertex.

This creates a tree on \frac{N}{2} vertices, and in fact the RBS gives us an Euler tour of this tree.

Further, observe that any substring of the RBS corresponds to a walk on this tree.
In fact, we have a slightly stronger criterion: any substring of the RBS which is itself an RBS, corresponds to starting at some vertex u, completely visiting some subtrees of its children, and then returning to u - which should be clear given how the tree was constructed.

Now, recall that we care about good RBS, i.e, RBS for which when the first and last characters are removed, the remainder is still an RBS.
It can be seen that a good RBS corresponds to not just a vertex and some subtrees of its children, but instead to (essentially) an entire subtree of the tree!

How?

As noted earlier, any RBS corresponds to starting at some vertex u and following the Euler tour of the tree for some contiguous subset of children of u, till you end back at u.

For a good RBS however, after the first move (which goes to a child, say v), the next N-2 moves themselves form an RBS; meaning after them we’ll end up at v again - and then finally move back up to u.
However, this also means that we’ve visited exactly the entire subtree of v, since by construction we’ll never go back to v once we’ve moved on from it.

In other words, what we have is exactly the full subtree of v, along with the edge joining it to its parent!

It’s easy to see that the above association works both ways - that is, take any non-root vertex u, and its subtree along with the edge joining it to its parent will correspond to a substring that’s a good RBS in S.
This also tells us that good RBS-es cannot intersect each other non-trivially - they will either be completely disjoint, or one will be contained inside another.
In particular, this means that two equal good RBS-es will be fully disjoint, so it’s enough to just count all pairs of them!

As we’ve established above, counting pairs of good RBS-es essentially means counting pairs of “equal” subtrees in the constructed tree.
This is now a classical problem (commonly called rooted tree isomorphism), which can be solved by hashing - for example several methods can be found here.

Note that the standard definition for rooted tree isomorphism allows for rearrangement of children, but in this case we do not (since order of children is quite important here).
So, make sure to choose a method of that allows for this to be done - in most cases, this is as simple as just not sorting the hashes of the children subtrees.

TIME COMPLEXITY:

\mathcal{O}(N) or \mathcal{O}(N\log N) per testcase.

CODE:

Author's code (C++)
#include<bits/stdc++.h>
#define int long long
#define fastio ios_base::sync_with_stdio(false);cin.tie(0);
const int M=1e9+7;
using namespace std;

void test_cases(){
	int n;
	string s;
	cin >> n >> s;
	n/=2;
	vector<vector<int>> adj(n);
	vector<int> parent(n);
	int to=0;
	
	// We need to build the tree corresponding to the good regular bracket sequence first.
	function<void(int,int)> build = [&](int i,int cur){
		if(i == 2*n){
			return;
		}
		if(s[i] == '('){
			if(cur != -1){
				/* The current node is not the root node, create a new node and add an edge between
				 * the current and the newly created node with the current node being the parent of
				 * the newly created node.
				 */
				 
				parent[to]=cur;
				adj[cur].push_back(to);
				adj[to].push_back(cur);
			}

			to++;
			build(i+1,to-1);
		}
		else{
			/*
			 * This is a closing brace, that means subtree corresponding to current node has been completed and we are exiting this node, so we need to
			 * get back to parent of the current node and do further build for the subtree of parent of current node.
			 */
			
			build(i+1,parent[cur]);
		}
	};
	build(0,-1);
	int last=1;
	vector<int> hash(n);

	// This contains the information of already seen hash sequences and hash value corresponding to that sequence.
	map<vector<int>, int> have;
	
	// Compute the hash values for each of the subtree and then maintain the count of same hash values.
	// We will only consider disjoint subtrees as two nodes one being in the subtree of the other can't have the same hash value.
	function<void(int,int)> dfs = [&](int s,int p){
		vector<int> children;
		for(int x: adj[s]){
			if(x != p){
				dfs(x, s);
				children.push_back(hash[x]);
			}
		}

		if(!have.count(children)){
			have[children]=last;
			hash[s]=last++;
		}
		else{
			hash[s]=have[children];
		}
	};
	dfs(0, -1);
	map<int, int> cnt;
	for(int x: hash){
		++cnt[x];
	}
	
	int ans=0;
	for(auto& x: cnt){
		int f=x.second;

		// f is the number of subtrees having the same hash values, there are f * (f-1) / 2 ways to choose two subtrees
		// that have the same hash value.
		ans+=f*(f-1)/2;
	}

	cout << ans << endl;
	
}

int32_t main(){
	fastio;
	int t=1;
	cin >> t;
	while(t--){
		test_cases();
	}
}
Tester's code (C++)
#include<bits/stdc++.h>

using namespace std;

#define all(x) begin(x), end(x)
#define sz(x) static_cast<int>((x).size())
#define int long long
typedef long long ll;


const int mod[6] = {998244353, (int)1e9 + 7, (int)1e9 + 9, (int)1e9 + 21, (int)1e9 + 33, (int)1e9 + 87};

mt19937 rng(chrono::steady_clock::now().time_since_epoch().count());
template<typename T>
T rand(T a, T b){
    return uniform_int_distribution<T>(a, b)(rng);
}

int power(int a, int b, int mod) {
        int res = 1;
        for (; b; b /= 2, a = a * a % mod) {
                if (b & 1) res = res * a % mod;
        }
        return res;
}

signed main() {

        ios::sync_with_stdio(0);
        cin.tie(0);


        int t;
        cin >> t;


        while (t--) {

                int n;
                cin >> n;
                string s;
                cin >> s;

                vector<vector<int>> pw(n, vector<int>(5));
                vector<vector<int>> p(n, vector<int>(5));
           
                for (int j = 0; j < 5; j++) {
                        int x1 = rand(1ll, (int)1e9);
                        int x2 = rand(1ll, (int)1e9);
                        int cur = 1;
                        int inv = power(mod[j], mod[j + 1] - 2, mod[j + 1]);
                        int curinv = 1;
                        for (int i = 0; i < n; i++) {
                                p[i][j] = ((i ? p[i - 1][j] : 0) + (s[i] == '(' ? x1 : x2) * cur) % mod[j + 1]; 
                                pw[i][j] = curinv;
                                curinv = curinv * inv % mod[j + 1];
                                cur = cur * mod[j] % mod[j + 1];
                        }
                }

                map<array<int, 5>, int> m;
                vector<int> v;
                for (int i = 0; i < n; i++) {
                        if (s[i] == '(') {
                                v.push_back(i);
                                continue;
                        }
                        int j = v.back();
                        v.pop_back();
                        array<int, 5> f;
                        for (int k = 0; k < 5; k++) {
                                f[k] = (p[i][k] - (j ? p[j - 1][k] : 0) + mod[k + 1]) * pw[j][k] % mod[k + 1];
                        }
                        m[f]++;
                }
                int ans = 0;
                for (auto [_, c] : m) ans += c * (c - 1) / 2;
                cout << ans << "\n";

        }
        
        
}

We don’t need tree hashing for this problem.
Each subtree of node corresponds to a substring of original string, so we can just take substring hash.

5 Likes