BINLAND - Editorial


Div-2 Contest
Div-1 Contest

Author: fmota
Tester: smelskiy
Editorialist: fmota




Data structures, queue, stack, dynamic programming


Given an empty grid, it’s necessary to handle 3 different query types, addition of a new line, where each cell has its own color (0 or 1), in the bottom of the grid, removal of the topmost line and counting of number of ways going from one of the topmost cells in the grid to one of the bottommost only passing by cells of the same color.


The problem can be interpreted as requiring a modified queue (push, pop & number of paths). If the problem required a modified stack (push, pop & number of paths), it could be solved with dynamic programming, keeping for each element in the stack, the number of ways to go to the bottom of the stack, ending in any of the possible bottom cells, this would require O(N^2) for push operation and O(1) for pop and query. There is a trick that allows to emulate a queue using two stacks (Method 2 in link), with this trick it’s possible to emulate a modified queue with 2 modified stacks, now the push and pop would have O(N^2) amortized complexity, the query requires now merging the results of both modified stacks, which can be done in O(N), using the fixed columns.


The first thing to do in this problem is to see what the add and remove operation mean, looking to the following image we can have a better idea:
In the image, the grid is rotated and represented as an array of lines, L1 represents the topmost line of the grid and LN the bottommost. Based on the image, we can see that the operations are similar to the operations present in a queue, this is an important detail to solve the problem.

Now, let’s work on calculating the number of ways, first let’s try to solve the problem disregarding add, remove operations and considering a non empty grid. So, there will be many path operations, but no add/remove operation. This version of the problem can be solved with dynamic programming, to solve all the queries we only need to calculate L_{cd}, where L_{cd} is the number of ways going from the topmost line in the c-th column going to the bottommost line in the d-th column. If the grid only has one line, L_{cd} = [c == d], because one cell can only reach itself on the same line. If the grid has M \geq 2 lines, we can consider L_{cd}^i as the result for the first i lines, knowing L_{cd}^i we can compute L_{cd}^{i+1} in O(N^2) complexity. So the overall process will take O(M*N^2), this method could be used in each query of the first subtask and solve it completely.

So far, there are no advances in how to handle add/remove operations, if the operations were stack like ones, the problem could be solved in O(Q * N^2) complexity, consider the following image:
The image follows the same format of the first image shown, but instead it considers the operations as if its stack like ones. considering the process of calculating the number of paths previously described we could store on the stack all L_{cd}^i and solve path/remove operations in O(1) and calculate the new L_{cd}^M in O(N^2) in the add operation.

There is an interesting trick that allows to emulate a queue with 2 stacks (Method 2 in link). Knowing this trick, the full problem can be solved, to be clearer, consider the following image:

Pseudo code
stack a, b

    push(b, element)

    if empty(a):
        while not empty(b):
            element = top(b)
            push(a, element)

In the trick, the queue is partitioned in two stacks, it’s only necessary stack operations and it’s possible to handle queue operations, as an element in the worst case will pass by the two stacks and going to be removed, the overall complexity to process T operations is O(T).

Using the trick of 2 stacks, the queue problem can be reduced to solving 2 times the problem in the stack, using the method previously described to handle stack problem, it’s possible to keep the information of paths in the form La_{cd} and Lb_{cd} representing the number of paths for each stack. This lead to O(N^2) amortized complexity in add/remove operations. The only remaining problem is how to calculate the overall result, taking into consideration both stacks. One way is to only calculate it when we receive a path query, using both c and d fixed by the query, we only have to consider all of the N possible points that connect one stack to the other, so the path query can be solved in O(N).

Some solutions that aren’t optmized for IO or aren’t optimized for memory maybe have problem to pass all test cases, the TL is tight, the reason is to cut off some optimized D&C solutions, but unhappily some D&C solutions were able to pass.


Setter's Solution
#include <bits/stdc++.h>
using namespace std;
template<typename T = int> vector<T> create(size_t n){ return vector<T>(n); }
template<typename T, typename... Args> auto create(size_t n, Args... args){ return vector<decltype(create<T>(args...))>(n, create<T>(args...)); }
const int mod = 1'000'000'007;
const int maxq = 300000, maxn = 20;
int w[2][maxq][maxn][maxn], stk[2];
string p[2][maxq];
int main(){
    int N, C, Q;
    cin >> N >> Q;
    auto upd_back = [&](int id){
        int n = stk[id];
        if(n >= 2){
            for(int i = 0; i < N; i++){
                for(int j = 0; j < N; j++) w[id][n - 1][i][j] = 0;
                for(int j = i-1; j <= i+1; j++){
                    if(j < 0 || j >= N) continue;
                    if(p[id][n - 1][i] != p[id][n - 2][j]) continue;
                    for(int k = 0; k < N; k++){
                        w[id][n - 1][i][k] += w[id][n - 2][j][k];
                        if(w[id][n - 1][i][k] >= mod) w[id][n - 1][i][k] -= mod;
        } else if(n == 1){
            for(int i = 0; i < N; i++) for(int j = 0; j < N; j++) w[id][0][i][j] = i == j;
        string ty; cin >> ty;
        if(ty[0] == 'a'){
            string s; cin >> s;
            p[1][stk[1]++] = s;
        } else if(ty[0] == 'r'){
            if(stk[0] == 0){
                    p[0][stk[0]++] = p[1][--stk[1]];
        } else {
            int i, j; cin >> i >> j; i--; j--;
            if(stk[1] == 0){
                cout << w[0][stk[0]-1][i][j] << '\n';
            } else if(stk[0] == 0){
                cout << w[1][stk[1]-1][j][i] << '\n';
            } else {
                int ans = 0;
                for(int k = 0; k < N; k++){
                    for(int k2 = k - 1; k2 <= k + 1; k2++){
                        if(k2 < 0 || k2 >= N) continue;
                        if(p[0][0][k] != p[1][0][k2]) continue;
                        if(!w[0][stk[0] - 1][i][k]) continue;
                        if(!w[1][stk[1] - 1][j][k2]) continue;
                        ans += (1ll * w[0][stk[0] - 1][i][k] * w[1][stk[1] - 1][j][k2]) % mod;
                        if(ans >= mod) ans -= mod;
                cout << ans << '\n';
    return 0;
Tester's solution
 O(Q * N * N)
#include <iostream>
#include <string>
#include <cstring>
#include <cassert>
#include <cstdio>
#include <algorithm>
#include <vector>
#include <map>
using namespace std;
const int N = 300005;
const int MD = 1'000'000'000 + 7;
struct query_t {
    int r;
    int source, target;
    int id;
int m, queries;
string ar[N];
int n;
map<pair<int, int>, vector<int> > dpl, dpr;
vector<int> save_l[N], save_r[N];
vector<query_t> q[N];
vector<int> ans;
int left_dp[20][20], right_dp[20][20];
void save_unique(vector<int>& w) {
    sort(w.begin(), w.end());
    vector<int> nw;
    for (int i = 0; i < w.size(); ++i) {
        if (!i || w[i] != w[i - 1])
    w = nw;

void read_queries() {
    int l = 1;
    string query_type = "";
    int x, y;
    for (int i = 1; i <= queries; ++i) {
        cin >> query_type;
        if (query_type == "add") {
            cin >> ar[++n];
        } else if (query_type == "remove") {
        } else if (query_type == "path") {
            cin >> x >> y;
            --x; --y;
            q[l].push_back({n, x, y, (int)ans.size()});
    for (int i = 1; i <= n; ++i) {
        if (!save_l[i].empty())
        if (!save_r[i].empty())
void recalc_left_dp(int l, int mid) {
    memset(left_dp, 0, sizeof(left_dp));
    for (int i = 0; i < m; ++i) {
        left_dp[i][i] = 1;
    vector<int> dp(m, 0);
    for (int i : save_l[mid]) {
        for (int j = 0; j < m; ++j)
            dp[j] = left_dp[j][i];
        dpl[{mid, i}] = dp;
    memset(right_dp, 0, sizeof(right_dp));
    for (int i = 0; i < m; ++i)
        right_dp[i][i] = 1;
    dp.resize(m, 0);
    for (int i : save_r[mid]) {
        for (int j = 0; j < m; ++j) {
            dp[j] = right_dp[j][i];
        dpr[{mid, i}] = dp;
    while (mid > l) {
        for (int i = 0; i < m; ++i) {
            vector<long long> dp(m, 0);
            for (int j = 0; j < m; ++j) {
                int cur = left_dp[i][j];
                if (j && ar[mid][j] == ar[mid - 1][j - 1])
                    dp[j - 1] += cur;
                if (ar[mid][j] == ar[mid - 1][j])
                    dp[j] += cur;
                if (j + 1 < m && ar[mid][j] == ar[mid - 1][j + 1])
                    dp[j + 1] += cur;
            for (int j = 0; j < m; ++j) {
                left_dp[i][j] = dp[j] % MD;
        for (int i : save_l[mid]) {
            vector<int> dp(m);
            for (int j = 0; j < m; ++j)
                dp[j] = left_dp[j][i];
            dpl[{mid, i}] = dp;
void push_right(int r) {
    for (int i = 0; i < m; ++i) {
        vector<long long> dp(m, 0);
        for (int j = 0; j < m; ++j) {
            int cur = right_dp[i][j];
            if (j && ar[r][j] == ar[r + 1][j - 1])
                dp[j - 1] += cur;
            if (ar[r][j] == ar[r + 1][j])
                dp[j] += cur;
            if (j + 1 < m && ar[r][j] == ar[r + 1][j + 1])
                dp[j + 1] += cur;
        for (int j = 0; j < m; ++j) {
            right_dp[i][j] = dp[j] % MD;
    for (int i : save_r[r]) {
        vector<int> dp(m);
        for (int j = 0; j < m; ++j)
            dp[j] = right_dp[j][i];
        dpr[{r, i}] = dp;
int solve_query(const vector<int>& lhs, const vector<int>& rhs) {
    __int128 result = 0;
    for (int i = 0; i < m; ++i) {
        result += lhs[i] * 1ll * rhs[i];
    result %= MD;
    return (int)result;
void process_queries() {
    int mid = -1;
    int l = -1, r = -1;
    for (int i = 1; i <= n; ++i) {
        for (query_t& query : q[i]) {
            if (i > mid) {
                recalc_left_dp(i, query.r);
                l = i;
                mid = query.r;
                r = query.r;
            while (r < query.r) {
            ans[] = solve_query(dpl[{i, query.source}],
int main(int argc, const char * argv[]) {
    #ifdef __APPLE__
    ios_base::sync_with_stdio(false); cin.tie(0); cout.tie(0);
    cin >> m >> queries;
    for (int i = 0; i < ans.size(); ++i) {
        cout << ans[i] << '\n';
    return 0;

edit: this is probably the divide-and-conquer approach that fmota wanted to prevent from passing.

I solved this problem offline, which made it a lot easier (I think).
I collected the complete grid (add strings from the add operation).
Then each query asking the number of paths is just a range query.

You can describe the number of paths from row x to the next row x+1 via a transition matrix M_x (M_x[i][j] is the number of ways to go from (x, i) to (x+1, j), this will be either 1 or 0). And going from (x_1, y_1) to (x_2, y_2) is just multiplying all those matrices between M := M_{x_1} \cdot M_{x_1+1} \cdots M_{x_2-1}, and looking at the correct index M[y_1][y_2].

An obvious solution to handle range queries efficiently, is to create a data-structure around it. Something like a segment tree, or other ones like the Sqrt tree with an even better complexity.
However those approaches are too slow. Especially the precomputation. Since you need like O(R \log R) matrix multiplications (assuming R rows) that each take O(N^3). I needed something better. Also the memory might become an issue. Doing some tricks like a prefix array is also not an option, since not all matrices are invertible.

One important observation is, that those transition matrices between two rows are band-matrices with bandwidth 1. Multiplying with such a matrix only takes O(N^2) time. So let’s use that.

I split the range queries into 3 groups:

  • Queries that are completely in the upper half of the grid (before row R/2).
  • Queries that include the row R/2.
  • Queries that are completely in the lower half of the grid (after row R/2).

I’ll focus on solving the queries of the second group first.
If we compute all suffix matrix products of the first half M_{R/2-1}, M_{R/2-2} \cdot M_{R/2-1}, M_{R/2-3} \cdot M_{R/2-2} \cdot M_{R/2-1}, \dots and all prefix matrix products of the second half M_{R/2}, M_{R/2} \cdot M_{R/2+1}, M_{R/2} \cdot M_{R/2+1} \cdot M_{R/2+2}, \dots, then we can answer each query with only one multiplication of a suffix and a prefix matrix. Computing the prefix/suffix matrices takes only O(R N^2) time (because of the faster multiplication using band matrices), and then anwering each query can be done in O(N^3). (EDIT: see further down the thread for an optimization!)

To answer the queries of group 1 and group 2, we just do some recursive calls.

In total, with a grid of R rows and Q range queries, the complexity is R\log R N^2 + Q N^3.
I just noticed, that the time limit was pretty close. But it got accepted on the first try.


Yes, that was the D&C that I was trying to cut off, by the way you can optimize it to O(R log RN^2 + QN) when you are going to answer a query, you don’t have to multiply the whole matrices, since the query fix two columns c and d, it’s only necessary to check paths of the form (c,k) and (l,d), where |k-l| \le 1, so there are only O(N) splits to check.

I was trying to come up with an explanation short and clear about the D&C approach, but couldn’t come up, you explained it way better than how I was thinking about doing it. Also, could you share your code ?


Oh, right. I actually did that optimization. (Haven’t got too much sleep this night, trying to get a few more points in the contest. Almost forgot about it already.)

Here is my solution:

So to compute the path from (x_1, y_1) to (x_2, y_2) we only need the row y_1 of the suffix matrix product (those are the number of ways from (x_1, y_1) to the row R/2 of the grid), and the column y_2 of the the prefix matrix product (those are the number of ways to reach the cell (x_2, y_2) from row R/2. A dotproduct of these two is the final solution for the query.

To avoid storing all those prefix and suffix matrices, I use the fact that the queries are sorted, both by left and right border. So while I compute the suffix matrices, I can extract the required row and store it next to the query, and when computing the prefix matrices, I can compute the dotproduct.

So the final complexity is in O(R \log R N^2 + Q N)


Here is a simple code for maximum.

struct P {
	int value;
	int accumulated;
stack<P> A, B;
P merge(P x, P y){ // x is on top of y
	x.accumulated = max(x.accumulated, y.accumulated);
	return x;
void push(int value){
	P element;
	element.value = element.accumulated = value;
		element = merge(element,;
void pop(){
			P element =;
			element.accumulated = element.value;
				element = merge(element,;
int get_max(){
	int mx = -(1<<30);
		mx = max(mx,;
		mx = max(mx,;
	return mx;

Loved this problem and editorial, Are there any similar problem… (which uses simple technique (like stack queue in this problem) to make it a hard problem)?


There is this one.

1 Like

An excellent problem and a very interesting editorial.
I’d just like to share my approach. I didn’t think it in terms of stacks and queues but did the exact same thing. So it’s just another way of imagining the same solution. I’ll share if anyone didn’t solve it my way and is interested in it.

I have one more important row that I need to keep track of except the top and bottom row, which I call the intermediate row.
The idea is that I divide my graph into two parts, one above the intermediate row, and one below it. For every row, I want to keep track of no. of ways from each column of that row to every column in the intermediate row. This can, of course, be done with dynamic programming.
If I can maintain the intermediate row, I can answer the path query in O(N^2) time in the following way-

Suppose I get a query from (1, 1) to (10, 1) and my intermediate row is 4 (suppose). I have represented the nodes in the graph in the format (row, column).
I know that this query can be broken down into N steps of the form-
ways from (1,1) to (4,1) x ways from (4,1) to (10,1) +
ways from (1,1) to (4,2) x ways from (4,2) to (10, 1) +
. . . + ways from (1,1) to (4, N) x ways from (4,N) to (10,1)

Now all that remains is to maintain the intermediate row efficiently.
This can be done in the following way-

Whenever I get an add query, I just add the new row and calculate all the ways for each of the columns of the new row to every column of the intermediate row in O(N^2) time since I already have the same for the row above it.

When I get a remove query,
If the intermediate row is below my top row, I can just move on the next row.
If the intermediate row is the row that I want to remove, I set the intermediate row to be the last row available right now and calculate all the ways using DP from last row to the current first row.

Time Complexity
For every row, I am calculating its ways to the intermediate row twice. Once when the intermediate row is above the current row and once when it is below the current row. So its O(total_rows * N^2 + no_of_path_queries * N^2)



I used the same approach as well but had to do a lot of optimizations to get it to work. Till the end I thought that this was the intended solution. No wonder that I had to optimize till my C++ code looked like C.

can you please post the link to your solution?

You can see it, by clicking in one of the options in the Solutions section.

Good Job Problem Setter. An excellent problem.

seriously I was trying from last day to just find out the way, how your code working and i really fucked up now, I tried my approach with dp[q][n][n] but at finally i can only make connection( like graph ), I don’t no how i can calculate combination and store them :nauseated_face: