EXPCH - Editorial

tmwilliamlin · February 21, 2020, 2:50pm

If you prefer a video explanation: EXPCH Video Solution - February Long Challenge 2020

PROBLEM LINK:

Practice
Div-2 Contest
Div-1 Contest

Author: Foyaz Akanda
Tester: Radoslav Dimitrov
Editorialist: William Lin

DIFFICULTY:

Medium

PREREQUISITES:

Combinatorics, Dynamic Programming

PROBLEM:

You are given a string S of \text{'('}, \text{'*'}, and \text{')'} and you choose a substring T uniformly at random. Process T from left to right. Whenever the number of closing parentheses in the current prefix is greater than the number of opening parentheses, change the closing parenthesis to an opening parenthesis. Find the expected number of changes.

SAMPLE EXPLANATION:

Some people were complaining about not having sample explanations, so here it is.

What does expected value mean?

Don't be lazy, did you try to google search it?

Expected value basically means an average value over all cases weighted according to the probability that the case occurs.

Since we choose T uniformly at random, meaning that each possible substring is equally likely, our average will be unweighted.

So, we need to find the average number of changes over all cases, and we can find the sum over all cases, and divide by the number of cases.

In the sample, S=\text{"((())"}, and there are 15 possible substrings. I’ve listed the answer for each one of them:

"(" = 0
"(" = 0
"(" = 0
")" = 1
")" = 1
"((" = 0
"((" = 0
"()" = 0
"))" = 1
"(((" = 0
"(()" = 0
"())" = 1
"((()" = 0
"(())" = 0
"((())" = 0

The sum is 4 and the average is \frac{4}{15}, so \frac{4}{15} is our answer.

What does multiplicative inverse mean?"

Don't be lazy, did you try to google search it?

We can find the multiplicative inverse of Q with the formula Q^{M-2} \mod M for any prime M (like 10^9+7 in the problem).

For our final answer, first we calculate the multiplicative inverse of 15 with the formula above, which gives 466666670.

We multiply that by P=4 modulo 10^9+7 and we get the final answer of 866666673.

QUICK EXPLANATION 1

Use the contribution technique / linearity of expectation and find the number of subarrays which change i for each i. We can calculate dp_{i, j} = the number of starting indexes which have n_o-n_c=j after processing the first i elements. We can update this DP fast using lazy shifting techniques.

QUICK EXPLANATION 2

Let dp_i be the sum of answers for all subarrays starting at i. If S_i=\text{'('}, find the first j which causes n_o-n_c to be negative, and dp_i=dp_j. If S_i=\text{'*'}, dp_i=dp_{i+1}. Otherwise, find the first j which causes n_o-n_c to be negative after we change S_i to \text{'('}, and dp_i=dp_j+n+1-i. Finding j for the transitions can be done in constant time by maintaining a stack of unmatched closing parentheses while processing from right to left.

EXPLANATION 1

Contribution Technique / Linearity of Expectation

Instead of finding the total sum of answers over all subarrays, let’s focus on how much each index i contributes to the total sum. Index i's contribution to the total sum is the number of subarrays which cause i to change. To find the total sum, we can add up the contribution of all indexes instead.

Explained with Sample

In the sample, S=\text{"((())"}, and only indexes 3 and 4 can contribute to the sum since those are the only closing parentheses.

We can check that the number of subarrays which cause index 3 to change is 2. Those subarrays are [3, 3] and [3, 4].

We can check that the number of subarrays which cause index 4 to change is 2. Those subarrays are [2, 4] and [4, 4].

We add up 2+2=4, which is the total sum over all subarrays.

Check out the “related problems” section below.

Given that we want to use contribution technique on this problem, how do we find the number of subarrays which change a certain index i?

A subarray [l, r] changes i if, when we process all characters from l to i (including changing closing parentheses to opening parentheses when necessary), n_o-n_c is negative right before we perform any changes to i. Notice that as long as r\ge i, the choice of r does not affect whether i is changed or not.

Thus, we care about two things: the starting index l of a subarray and the value of n_o-n_c when we reach index i.

This suggests a DP solution: dp_{i, j} is the number of l such that the value n_o-n_c is j after processing the first i characters.

We need to figure out the DP transitions, which involves some casework:

Transitions

If S_i=\text{'*'}:

dp_{i, j}=dp_{i-1, j}, since the value n_o-n_c doesn’t change.

We also create a new subarray [i, i], so we should add 1 to either dp_{i-1, 0} before we transition or dp_{i, 0} after we transition.

If S_i=\text{'('}:

dp_{i, j+1}=dp_{i-1, j}, since the value n_o-n_c increases by 1.

We also create a new subarray [i, i], so we should add 1 to either dp_{i-1, 0} before we transition or dp_{i, 1} after we transition.

If S_i=\text{')'}:

dp_{i, j-1}=dp_{i-1, j}, since the value n_o-n_c decreases by 1.

We also create a new subarray [i, i], so we should add 1 to either dp_{i-1, 0} before we transition or dp_{i, -1} after we transition.

We need to do something additional in this case: since n_o-n_c might become negative (-1 to be precise), we may need to change the last closing parenthesis into an opening parenthesis. After we perform the change, n_o-n_c will change from -1 to 1, so we need this additional step:

First, add dp_{i, -1} to dp_{i, 1}. Then, set dp_{i, -1} to 0.

We have our DP transitions, but we still need to figure out how to calculate the answer. Remember that the criteria for i to change is that n_o-n_c is negative (-1 to be precise) right before we perform any changes to i.

Thus, the number of ways to choose l for a subarray which changes i is dp_{i, -1} before we change i. The number of ways to choose r\ge i is n-i, so we should add dp_{i, -1} \cdot (n-i) to the answer.

Our transition for the case S_i=\text{')'} is modified below:

Transition

Add 1 to dp_{i-1, 0}.
dp_{i, j-1}=dp_{i-1, j}.
Add dp_{i, -1} \cdot (n-i) to the answer.
Add dp_{i, -1} to dp_{i, 1}.
Set dp_{i, -1} to 0.

After we have found the total sum, we find the expected value by dividing the sum by the number of subarrays, which is \frac{n(n+1)}{2}.

Sadly, this solution is O(n^2), so let’s try to find the bottleneck of this solution and improve it. Ideally, we want transitions to be constant time, but the shifting operations (such as dp_{i, j}=dp_{i-1, j}) take linear time.

Notice that the array basically remains the same after shifting, it’s just the indexes which are changing. Also, we don’t need the values of dp_{i-1} and all previous rows after we transition to i.

Let dp_{i, j+o_i}=dp_{i-1, j}. What if, we never actually calculate dp_i, but whenever we want to retrieve or modify the value dp_{i, j}, we actually just do it on dp_{i-1, j-o_i}?

We save the O(n) transition and all other operations basically run in constant time.

Let’s go further: We don’t store dp_{i-1} as well, and if we want dp_{i, j}, we just take the value dp_{i-2, j-o_i-o_{i-1}}. Why not just delete all dp_i except for dp_0=d, and whenever we want the value dp_{i, j}, we just look at d_{j+o} for some offset o? This allows to perform the shift transitions in constant time!

Using this trick, we shoule maintain an offset o while we iterate from left to right, and our new transitions look like:

Transitions

Add 1 to d_{o} (which was dp_{i-1, 0}).

If S_i=\text{'('}:

Subtract 1 from o to simulate dp_{i, j+1}=dp_{i-1, j}.

If S_i=\text{')'}:

Add 1 to o to simulate dp_{i, j-1}=dp_{i-1, j}.
Add d_{o-1} \cdot (n-i) (which was dp_{i, -1} \cdot (n-i)) to the answer.
Add d_{o-1} to d_{o+1} (which was dp_{i, 1}).
Set d_{o-1} to 0.

In the implementation, in order to prevent negative indexes, d should be an array of size 2n+1 and o should start from n.

EXPLANATION 2

Instead of doing a DP from left to right, let’s try a DP from right to left. Let dp_i = the sum of answers for all subarrays starting at i. Our final answer is the sum of all dp_i divided by \frac{n(n+1)}{2}, the number of subarrays. We should do some casework to find the transitions:

Transitions

If S_i=\text{'*'}:

We can ignore S_i as it doesn’t do anything, so dp_i=dp_{i+1}.

If S_i=\text{'('}:

Let’s somehow find the first index j such that the subarray [i, j] contains more closing parentheses than opening parentheses.

By our definition of j, since it is the first such index, all indexes before j won’t be changed and won’t contribute to the answer.

Also, we know that in the subarray [i, j-1], the number of the two types of parentheses must be equal. If there were more opening parentheses than closing ones, then the number of closing parentheses could not possibly have become greater after adding index j. If there were more closing parenthese than opening ones, this contradicts our assumption that j is the first index to have this property since j-1 also does.

Since all parentheses in [i, j-1] cancel out, the indexes which are changed would be the same as if we started from index j instead. Thus, dp_i=dp_j.

If there is no such j, dp_i=0.

If S_i=\text{')'}:

S_i definitely will be changed, and all subarrays starting at i will cause i to change, so we add n-i to dp_i.

Since we change S_i to \text{'('}, let’s temporarily pretend that S_i=\text{'('}. Then, we can perform the transition for S_i=\text{'('} as described above.

To complete the solution, we just need a fast way to find j in the transitions. Notice the similarity of this problem to parentheses matching: if all parentheses in [i, j-1] cancel out, then they are balanced.

We should use a standard algorithm for parentheses matching but process from right to left because that’s the direction of our DP. So, we should maintain a stack of unmatched \text{')'} s.

Whenever we encounter a \text{'('}, we should match it with the top of the stack (if the stack is empty, don’t do anything). Then, as the top of the stack is the first unmatched \text{')'}, it is the first j which causes the number of closing parentheses to be greater than the number of opening ones.

SOLUTIONS:

Setter's Solution

#include <bits/stdc++.h>
using namespace std;
 
#define mod 1000000007
#define ll long long
 
const int N = 10000050;
 
ll bm(ll b,ll p)
{
    ll ret=1;
    while(p)
    {
        if(p&1) ret=(ret*b)%mod;
        p/=2;
        b=(b*b)%mod;
    }
    return ret;
}
int mp[3*N];
char str[N];
int csum[N];
int dp[N];
 
int main()
{
    int t;
    scanf("%d",&t);
    assert(t>=1 && t<=10);
    int totN=0;
    while(t--)
    {
        int n;
        scanf("%d", &n);
        assert(n>=1 && n<=10000000);
        totN+=n;
        for(int i=0; i<=3*n; i++) mp[i]=0;
        scanf("%s",str);
        assert(strlen(str)==n);
        for(int i = 0; i<n; i++){
            assert(str[i]=='(' || str[i]=='*' || str[i]==')');
            int v=-1;
            if(str[i]=='(') v=1;
            else if(str[i]=='*') v=0;
            if(i) csum[i] = csum[i-1] + v;
            else csum[i] = v;
        }
        dp[0]=0;
        ll ans = 0;
        for(int i = n-1; i>=0; i--){
            dp[i] = 0;
            dp[i] += (n-i) * (str[i]==')');
            int val;
            if(str[i]!='*') val = csum[i]-1;
            else val = csum[i];
            int nw=val-1;
            if(nw<0) nw+=2*n;
            int id=mp[nw];
            if(id) dp[i] += dp[id];
            nw=csum[i];
            if(nw<0) nw+=2*n;
            mp[nw]=i;
            dp[i]%=mod;
            ans += dp[i];
            ans %= mod;
        }
        ans = (2LL * ans)%mod;
        ans= (1LL*ans*bm(n,mod-2))%mod;
        ans= (1LL*ans*bm(n+1,mod-2))%mod;
        printf("%lld\n", ans);
    }
    assert(totN>=1 && totN<=10000000);
    return 0;
}

Tester's Solution

#include <bits/stdc++.h>
#define endl '\n'
 
#define SZ(x) ((int)x.size())
#define ALL(V) V.begin(), V.end()
#define L_B lower_bound
#define U_B upper_bound
#define pb push_back
 
using namespace std;
template<class T, class T1> int chkmin(T &x, const T1 &y) { return x > y ? x = y, 1 : 0; }
template<class T, class T1> int chkmax(T &x, const T1 &y) { return x < y ? x = y, 1 : 0; }
const int MAXN = (int)1e7 + 42;
const int mod = (int)1e9 + 7;
 
int pw(int x, int p) {
	int r = 1;
	while(p) {
		if(p & 1) r = r * 1ll * x % mod;
		x = x * 1ll * x % mod;
		p >>= 1;
	}
 
	return r;
}
 
int n;
string s;
 
void read() {
	cin >> n >> s;
}
 
int ret[MAXN];
 
void solve() {
	stack<int> st;
 
	int ans = 0;
	ret[n] = 0;
	for(int i = n - 1; i >= 0; i--) {
		if(s[i] == '(') {
			if(!st.empty()) {
				st.pop();
			}
			
			int first_fail = st.empty() ? n : st.top();
			ret[i] = ret[first_fail];
		} else if(s[i] == ')') {
			if(st.empty()) {
				ret[i] = (n - i);
			} else {
				int add_again = st.top();
				st.pop();
				ret[i] = ret[st.empty() ? n : st.top()];
				ret[i] = (ret[i] + n - i) % mod;
				st.push(add_again);
			}
 
			st.push(i);
		} else {
			ret[i] = ret[i + 1];
		}
 
		ans = (ans + ret[i]) % mod;
	}
 
	ans = ans * 1ll * pw((n * 1ll * (n + 1) / 2ll) % mod, mod - 2) % mod;
	cout << ans << endl;
}
 
int main() {
	ios_base::sync_with_stdio(false);
	cin.tie(NULL);
 
	int T;
	cin >> T;
 
	while(T--) {
		read();
		solve();
	}
 
	return 0;
}

Editorialist's Solution 1

#include <bits/stdc++.h>
using namespace std;

#define ll long long

const int mxN=1e7, M=1e9+7;
int n, d[2*mxN+1];
string s;

ll powM(ll b, int p) {
	ll r=1;
	for(; p; b=b*b%M, p/=2)
		if(p&1)
			r=r*b%M;
	return r;
}

void solve() {
	//input
	cin >> n >> s;

	memset(d, 0, 4*(2*n+1));
	int o=n;
	ll ans=0;
	for(int i=0; i<n; ++i) {
		//add a sum of 0
		++d[o];
		if(s[i]=='(') {
			//shift c by 1
			--o;
		} else if(s[i]==')') {
			//shift c by -1
			++o;
			//process negative sums
			ans+=(ll)d[o-1]*(n-i);
			d[o+1]+=d[o-1];
			d[o-1]=0;
		}
	}
	//divide by n*(n+1)/2
	ans=ans%M*powM((ll)n*(n+1)/2%M, M-2)%M;
	cout << ans << "\n";
}

int main() {
	ios::sync_with_stdio(0);
	cin.tie(0);

	int t;
	cin >> t;
	while(t--)
		solve();
}

Editorialist's Solution 2

#include <bits/stdc++.h>
using namespace std;

#define ll long long

const int mxN=1e7, M=1e9+7;
int n;
ll dp[mxN+1];
string s;

ll powM(ll b, int p) {
	ll r=1;
	for(; p; b=b*b%M, p/=2)
		if(p&1)
			r=r*b%M;
	return r;
}

void solve() {
	//input
	cin >> n >> s;

	ll ans=0;
	dp[n]=0;
	stack<int> st;
	for(int i=n-1; ~i; --i) {
		if(s[i]=='(') {
			//delete a matching ')'
			if(!st.empty())
				st.pop();
			//the first ')' which can't be matched
			int j=st.empty()?n:st.top();
			dp[i]=dp[j];
		} else if(s[i]==')') {
			//all subarrays starting at i will change i
			dp[i]=n-i;
			if(!st.empty()) {
				//temporary change i to '(' and try to match
				int temp=st.top();
				st.pop();
				//the first ')' which can't be matched
				int j=st.empty()?n:st.top();
				dp[i]+=dp[j];
				//revert the change
				st.push(temp);
			}
			//add this ')' to the stack
			st.push(i);
		} else {
			dp[i]=dp[i+1];
		}
		ans+=dp[i];
	}
	//divide by n*(n+1)/2
	ans=ans%M*powM((ll)n*(n+1)/2%M, M-2)%M;
	cout << ans << "\n";
}

int main() {
	ios::sync_with_stdio(0);
	cin.tie(0);

	int t;
	cin >> t;
	while(t--)
		solve();
}

EXPCH - Editorial

PROBLEM LINK:

DIFFICULTY:

PREREQUISITES:

PROBLEM:

SAMPLE EXPLANATION:

QUICK EXPLANATION 1

QUICK EXPLANATION 2

EXPLANATION 1

EXPLANATION 2

SOLUTIONS:

Related Problems