RCTEXSCC - Editorial

radoslav192 · January 6, 2021, 4:51pm

PROBLEM LINK:

Author: Temirlan Baibolov

Tester & Editorialist: Radoslav Dimitrov

DIFFICULTY:

PREREQUISITES:

Probability, Euler’s Formula

PROBLEM:

Given a M \times N matrix of random integers between 1 and K, find the expected number of strongly connected components if we have a directed edges between every two adjacent cells iff the value in the first cell is greater than or equal to the value in the second cell.

Quick Explanation

We notice that every strongly connected component will have all of its vertices/cells equal to the same value.
Then we can just solve the problem of finding the number of connected components if the graph was undirected and we had edges iff two adjacent cells had the same value. This has a simple solution because we can use Euler’s formula for planar graphs.

Explanation

We’ll first show why our problem is equivalent to counting the number of connected components in the undirected graph formed by edges between cells with the same value. Let’s prove this by contradiction:

Assume there are two adjacent cells (x_1, y_1) and (x_2, y_2) that are in the same strongly connected component and A_{x_1, y_1} < A_{x_2, y_2}. Then by the definition of strong connectivity, there should exist a path from (x_1, y_1) to (x_2, y_2). However, all edges “decrease” the value (i.e. they go from a larger value to a lower one). This means that all cells reachable from (x_1, y_1) will have a value of at most A_{x_1, y_1}. Therefore, (x_2, y_2) is not reachable from (x_1, y_1), because A_{x_1, y_1} < A_{x_2, y_2}. This contradicts with our assumption of the cells being in the same strongly connected component$.

From the above proof, it follows that two cells can be in the same strongly connected component iff their values are the same (then we have a bidirectional edge). This by itself is equivalent to counting the number of connected components in an undirected graph.

To solve this new problem (the undirected version), we first notice that the graph is planar by construction. This is useful, because of one of the most popular theorems for planar graphs - Euler’s formula. It states that v - e + f = 2, where v is the number of vertices in the graph, e is the number of edges and f is the number of faces. This formula works for any connected planar graph, but it can be easily generalised for graphs with more than one connected component (which is our case). Let k be the number of connected components. Then the generalisation states that v - e + f = 1 + k.

Let’s get back to our problem. We are interested in \mathop{\mathbb{E}}[k] = \mathop{\mathbb{E}}[v - e + f - 1]. Using linearity of expectation we get \mathop{\mathbb{E}}[k] = \mathop{\mathbb{E}}[v] - \mathop{\mathbb{E}}[e] + \mathop{\mathbb{E}}[f] - \mathop{\mathbb{E}}[1]. Now we’re only left with calculating the different terms of the previous equation:

\mathop{\mathbb{E}}[v] is the expected number of vertices. This is clearly always constants, so \mathop{\mathbb{E}}[v] = N * M.
\mathop{\mathbb{E}}[e] is the expected number of edges. We’ll use the contribution idea (with linearity of expectation) to calculate it. Consider an edge between two different cells. It’ll exist iff the values in the two cells are the same. Therefore, the probability of it existing is \frac{1}{K}, as we can think of this as if we’ve fixed the first value and then finding the probability of the second one being the same. Now we can simply use linearity of expectation to get that \mathop{\mathbb{E}}[e] = \frac{N * (M - 1) + (N - 1) * M}{K}, because we have N * (M - 1) + (N - 1) * M edges and the contribution of each one is equal to \frac{1}{K}.
\mathop{\mathbb{E}}[f] is the expected number of faces. The main observation here is that we can only have a face that looks like a 2 \times 2 square with the same value. Also we shouldn’t forget that we always have 1 additional face on the “outside”. Clearly, that M = 1, we have \mathop{\mathbb{E}}[f] = 1, because the only face is the “outside” one. Now let’s consider the M = 2 case. Then we can again use linearity of expectation. The number of 2 \times 2 squares is N - 1 and the probability of each of them forming a face is \frac{1}{K^3} (using the same idea of fixing one of the values). Then \mathop{\mathbb{E}}[f] = \frac{N - 1}{K^3} + 1.
\mathop{\mathbb{E}}[1] = 1. This one is quite obvious.

Now we are pretty much done. We just calculate the 4 above-mentioned values in constant* time and then fine the required expectation for the number of connected components.

*(we have to find the inverse of K, so the complexity actually logarithmic).

Additional comments

The given constraints might look a bit weird, because they allow linear solutions. This is because initially the author’s solution was using FFT to calculate the number of configurations with 1, 2, ... , N * M connected components. This didn’t look very natural, so the problem was converted to finding the expectation for the number of connected components (where the intended solution was again FFT). Then while testing the problem, I solved it with the Euler’s formula, but we decided to still let the FFT/some other linear solutions pass and so we didn’t increase the constraints to 10^9.

Also, there is a way to solve the M = 3 case (which wasn’t included in the problem) again using the Euler’s formula solution, as then we only need to additionally count the 3 \times L faces which always look like a border with a “hole” inside. An open question would be whether it’s possible to count the number of components for larger M's. If you have some ideas, I’m open to discuss!

Tester's solution

#include <bits/stdc++.h>
#define endl '\n'

#define SZ(x) ((int)x.size())
#define ALL(V) V.begin(), V.end()
#define L_B lower_bound
#define U_B upper_bound
#define pb push_back

using namespace std;
template<class T, class T1> int chkmin(T &x, const T1 &y) { return x > y ? x = y, 1 : 0; }
template<class T, class T1> int chkmax(T &x, const T1 &y) { return x < y ? x = y, 1 : 0; }
const int MAXN = (1 << 20);
const int mod = 998244353;

int pw(int x, int p) {
	int r = 1;
	while(p) {
		if(p & 1) r = r * 1ll * x % mod;
		x = x * 1ll * x % mod;
		p >>= 1;
	}

	return r;
}

int n, m, k;

void read() {
	cin >> m >> n >> k;
}

void solve_1() {
	int ik_c = (1 - pw(k, mod - 2) + mod) % mod;
	int ans = (1ll + ik_c * 1ll * (n - 1)) % mod;
	cout << ans << endl;
}

void fix(int &x) {
	if(x >= mod) x -= mod;
	if(x < 0) x += mod;
}

void solve_2() {
	int ik = pw(k, mod - 2);
	// We'll use Euler's theorem:
	// v - e + f = 1 + k
	// E[k] = 2 * n - 1 - E[e] + E[f]
	// 
	// E[e] = ik * (3 * n - 1)      

	int answer = 2 * n - 1;
	answer -= ik * 1ll * (3ll * n - 2) % mod; fix(answer);
	int Ef = ik * 1ll * ik % mod;
	Ef = Ef * 1ll * ik % mod;
	Ef = Ef * 1ll * (n - 1) % mod;
	Ef += 1; // main face

	answer += Ef; fix(answer);
	cout << answer << endl;
}

void solve() {
	if(m == 1) solve_1();
	else solve_2();
}

int main() {
	ios_base::sync_with_stdio(false);
	cin.tie(nullptr);

	read();
	solve();
	return 0;
}

VIDEO EDITORIAL:

anon10218360 · January 11, 2021, 9:53am

\frac{p}{q}=\frac{k+(n-1)(k-1)}{k} for m=1

\frac{p}{q}=\frac{k^2(2k-1)+(n-1)*(2k^3-3k^2+1)}{k^3} for m=2

akshitm16 · January 11, 2021, 9:53am

I wrote an N*K^2 dp getting inspiration from @carre 's NRWP subtask 2 where K^2 states for each i can be easily converted to only 2 states giving us O(N) solution- implementation.

akamoha · January 11, 2021, 11:16am

I solved the first subtask with an M*N² DP solution and then ran the code for all N ranging from 1 to ~200 and printed the outputs locally. I observed a pattern: ans(N) for N>3 can be expressed as ans(2) + (ans(3)-ans(2))*(N-2)

So for the bigger subtasks with M=2, I simply calculated answer for N=3 and N=2 and then used this formula for the given N. Not the best way to solve the problem because it allowed me to get AC without really understanding the underlying theory about Euler’s formula (link to my implementation).

But learnt something from reading the editorial. Nice question and neat editorial!

lemongrab · January 11, 2021, 11:52am

Very simple and innovative. During contest I tried to come up with NxM dp without any luck. Thank you for editorial!

dgupta0812 · January 11, 2021, 3:45pm

Can you please explain it a little bit more ?

akshitm16 · January 11, 2021, 3:51pm

I’d love to but in a few hours …I’ll write in detail.

dhrub_kumar · January 11, 2021, 3:54pm

What is linearity of expectation?

dgupta0812 · January 11, 2021, 3:58pm

Okay, Will be waiting for it

carre · January 11, 2021, 4:11pm

It’s great that I could have inspired you, sadly I didn’t inspire myself and wrote a really ugly solution

Congratulations on your approach!

udayan14 · January 11, 2021, 5:57pm

I got some hint from this paper. I managed to derive a formula using the methods stated in it. Find the increase in the expected number of connected components after adding one column. Cool Problem

peresz · January 11, 2021, 9:31pm

You can also solve this problem using elementary probability theory. The M = 1 case is pretty easy:

Let random variable I_i = 1 if A[i] \neq A[i+1] and 0 otherwise. Filling up the one dimensional array A from 1 to N, you can notice that every time a new component is created if and only if A[i] \neq A[i+1]. So the number of components is 1 + \sum_{i=1}^{N-1} I_i. Since the variables inside the sum are independent, the sum itself is a binomial distribution with parameters N-1 and 1 - 1/K. Formally, \sum_{i=1}^{N-1} I_i \sim B(N-1, 1 - 1/K). See Binomial distribution on Wikipedia. We know that the expectation value of the binomial distribution is the product of the two parameters so the expected number of components are: \displaystyle 1 + (N-1) \cdot \frac{K-1}{K}.

Notice that it is in the form of a constant plus (N-1) times a value that only depends on K:
c + (N-1) \cdot f(K).

Update: The discussions below this point are only valid for M=2, which has been pointed out in the comment below. I apologize for my mistake.

This is also true for higher values of M although the calculation needs a bit more work than using Euler’s theorem. The observation is that, while filling the matrix column by column, the increase in the number of components only depends on the values of the last column and the newly added column. That is, it only depends on M and K. For a fixed M, the dependence is only on K. So for M=2 we can determine f(K) by studying the configurations of 4 random variables. The constant c in the above expression is the expectation for one column which we have already calculated above.

So, I think it is possible to solve this for higher values of M but it will get complicated.

radoslav192 · January 11, 2021, 11:00pm

I don’t think this can be generalised for larger M's. This is mainly because of the “The observation is that, while filling the matrix column by column”. That’s indeed true for the M = 2 case, but for larger values we can actually have something like this:

1111
0001
0111

Consider the first 2 columns. We have no way of figuring out that the cells (1, 1) and (3, 2) will be in the same connected component based only on two adjacent rows.

peresz · January 12, 2021, 12:00am

Yes, you are right. I incorrectly generalized the M=2 case to higher. So, it is a good brain teaser to think about the M \geq 3 case.

ashu_3916 · January 12, 2021, 1:07pm

Can anyone help me understanding the time complexity accepted for various subtasks . How do we come to know what should be the time complexity to get a particular subtask accepted. Can anyone explain with current question as an example , it would be great help for a newbie like me.

akamoha · January 12, 2021, 1:47pm

Hi @ashu_3916 take a look at @betlista’s post in this thread: how many approx loops are allowed in 1 sec time limit.

With that in mind, lets take a look at what time complexities would be accepted for various subtasks of this problem:
Subtask #1: since N and K are very small (both less than 5), solutions with really bad time complexity like exponential will also be accepted.
Subtask #2: Here N can be as large as 10⁵ and K as large as 10⁸. So O(N) would work, O(K) probably won’t work and O(NK) or anything bigger will definitely not work.
Subtask #3: Here N and K can be at most 500, so complexities like O(NK²), O(N²K) or lower will work because these will amount to ~10⁷ operations but higher than that like O(NK³) won’t work.
Subtask #4: Constraints on N and K are the same as the second subtask but with M=2. So again, O(N) would work but O(NK) or bigger will not work.

So with the given constraints, the worst time complexity that could give you an AC for all subtasks and a 100 score for this problem is linear in N i.e. O(N).

Hope that helps!

akshitm16 · January 12, 2021, 2:39pm

Here it is!

ashu_3916 · January 12, 2021, 2:54pm

Thanks

dgupta0812 · January 12, 2021, 7:01pm

Thanks. It’s a really well written editorial.

shobhit10058 · January 13, 2021, 5:33am

I think I also a wrote a dp solution like you but looks really ugly as I manually added all possible cases. CodeChef: Practical coding for everyone