RWALKS - Editorial

PROBLEM LINK:

Practice
Div-2 Contest
Div-1 Contest

Author: Sayantan Jana
Tester: Felipe Mota
Editorialist: Sayantan Jana

DIFFICULTY:

Medium-Hard

PREREQUISITES:

Lowest Common Ancestor, Centroid Decomposition, Tries

PROBLEM:

Given a tree with N nodes and Q queries of 2 types :

  • type 1 u d : You take a random walk of length at max d from node u.

In random walk, you randomly choose a neighboring node to move into which has not been visited so far in the current walk. You stop only when either you have walked a length d or you are at a node which has no unvisited neighbor.

  • type 2 u : Print the expected number of times the node u has been visited so far. If the answer is \frac{P}{Q}. Print P \cdot Q^{-1} mod 10^9+7.

QUICK EXPLANATION:

  • We can find the probability of reaching v from u in O(1) using lowest common ancestor and some preprocessing.
  • To efficiently handle the queries, we decompose the tree using centroid decomposition, where each node has O(log(N)) ancestors. Now at each node and at each edge in the decomposed tree, we maintain a container that supports insertion of pairs of form \{a,b\} and for a given d answers \sum b_i such that a_i \geq d in O(log(N)). Trie is used in setter’s solution.
  • For query 1 u d, visit each of the ancestors of u in the decomposed tree, let anc be current ancestor and e be the edge through which we visited the ancestor. Add to the container for anc, \{d-dist(anc,u), \frac{1}{degree[u]} \cdot P( u \rightarrow anc )\} . Also add the same pair to the container for e.
  • For query 2 u, visit each of the ancestors of u in the decomposed tree, let anc be current ancestor and e be the edge through which we visited the ancestor. Add to the answer, P'( anc \rightarrow u ) \cdot \sum b_i such that a_i \geq dist(anc,u) from the container for anc. Subtract from the answer, P'( anc \rightarrow u ) \cdot \sum b_i such that a_i \geq dist(anc,u) from the container for e. P' differs from P at the first term of product.
  • Note, we maintain a separate container for the starting node for the same reason, difference between P' and P. For query 2's addition at the starting node, we need P instead of P'.

EXPLANATION:

Probability of reaching v from u in random walk

Consider a path between u and v : u \rightarrow v_1 \rightarrow v_2 ... v.

  • At node u, we randomly choose any of the neighbours of u, each with a probability of \frac{1}{degree[u]}.
  • We have chosen node v_1, at node we can again choose any of the neighbours of v_1 but not u since it is already visited. As we are dealing with a tree, other than u, there cannot be any other already visited neighbour of v_1, if v_1 is the most recently visited node. The probability of going from v_1 to v_2 is hence \frac{1}{degree[v_1]-1}.
  • At each of the intermediate nodes, the probability of moving from v_i to v_{i+1} (maybe the final target vertex v even) is \frac{1}{degree[v_i]-1}.

Thus the probability of reaching v from u :

P(u \rightarrow v) = \frac{1}{degree[u]} \cdot \prod_i \frac{1}{degree[v_i]-1}

We define another term intermediate probability of reaching v from u, i.e., given that u was not the starting node but some node that got visited in the random walk, what would be the probability of reaching v,

P'(u \rightarrow v) = \frac{1}{degree[u]-1} \cdot \prod_i \frac{1}{degree[v_i]-1}

We can get both P and P' in O(1) by finding the lowest common ancestor of the 2 nodes u and v and some initial preprocessing.

If we root the tree and perform the initial LCA preprocessing and prepare prefix\_prob[] as :

prefix\_prob[root] = 1
prefix\_prob[u] = prefix\_prob[parent(u)] \cdot \frac{1}{degree[u]-1}

we can get P and P' as :

P(u \rightarrow v) = \frac{1}{degree[u]} \cdot P'(parent[u] \rightarrow LCA(u,v)) \cdot P'(LCA(u,v) \rightarrow v)
P'(u \rightarrow v) = P'(u \rightarrow LCA(u,v)) \cdot P'(LCA(u,v) \rightarrow v)
P'(u \rightarrow LCA(u,v)) = \frac{prefix\_prob[u]}{prefix\_prob[LCA(u,v)]}
P'(LCA(u,v) \rightarrow v) = \frac{1}{degree[LCA(u,v)-1]} \cdot \frac{prefix\_prob[parent(v)]}{prefix\_prob[LCA(u,v)]}

Take some care to handle the cases when u is an ancestor of v or vice versa.

Centroid Decomposition on the tree

We cannot do a full length traversal of length d, as it would take a lot of time. Instead, we apply centroid decomposition on the tree so that we have O(log N) nodes as ancestors for each of the nodes. We plan to traverse from the starting node to the root of decomposed tree for each of the queries.

For each node and edge in the decomposed tree, we maintain a container that supports fast insertion of pairs of form \{a,b\} and for a given d answers quickly \sum b_i such that a_i \geq d. Let’s call them container[node] and container[e]. Trie is used in setter’s solution which allows both insertion and answering in O(log(max(a_i)). The technique is similar to the one used in order statistic trie. One can opt for any other data structure as well.

Also we maintain an additional container for each of the nodes, which we will discuss about later. Let’s call it starting\_container[node].

Refer Centroid Decomposition of a Tree by Tanuj Khattar for understanding.

How trie acts as the essential container

Essentially we use a bit trie where each node has 2 outgoing edges, for bit 0 and 1 respectively. Additionally we maintain a parameter sum at a node.

Insertion of \{a,b\} :

  • Convert a to its binary representation and make a string of length log(N), that is the maximum possible length, fill in with leading 0s to the binary representation of a to make it of length log(N).
  • Insert the binary string into the trie. At each of the nodes in the trie that we visit while inserting the string, add b to the sum parameter of the node.

Answering for a given d , i.e., to find \sum b_i such that a_i \geq d:

  • Convert d to its binary representation and make a string of length log(N), that is the maximum possible length, fill in with leading 0s to the binary representation of d to make it of length log(N).
  • Now search for the string constructed, each time we move from a node to a child node corresponding to bit 0, add to the answer sum at the child node corresponding to bit 1.
  • Finally as we reach a leaf node in the trie, add to the answer sum at that node.
  • Note any time if we are not able to find a child node corresponding to the next bit, we return the answer calculated so far.

Handling query of type 1

For query of type 1, u d :

  • Add to starting\_container[u], \{d,1\}.
  • We traverse from the node u to the root of the decomposed tree.
  • Say we are at an ancestor anc and the edge that leads to anc in the traversal be e. Insert the pair \{d-dist(anc,u),P(u \rightarrow anc)\} to both container[anc] and container[e].

Handling query of type 2

Given a node u:

  • Add to the answer \sum b_i from container[u] such that a_i \geq 0.
  • Add to the answer \sum b_i from starting\_container[u] such that a_i \geq 0.
  • We traverse from the node u to the root of the decomposed tree.
  • Say we are at an ancestor anc and the edge that leads to anc in the traversal be e. Add to the answer, P'(anc \rightarrow u) \cdot \sum b_i from container[anc] such that a_i \geq dist(u,anc). Subtract from the answer, P'(anc \rightarrow u) \cdot \sum b_i from container[e] such that a_i \geq dist(u,anc). Add to the answer, P(anc \rightarrow u) \cdot \sum b_i from starting\_container[anc] such that a_i \geq dist(u,anc).

Essentially what we did in the handling for the 2 queries was :

  • During query 1, we have put an entry for a random walk in the container and edge for each of the ancestors of the node.
  • During query 2, we add up the expectations of having been reached from a node in the subtree of an ancestor, however one can add up the expectations repeatedly.
    Say the path from queried node u to root in the decomposed tree is u \rightarrow anc_{u,1} \rightarrow anc_{u,2} ... root and suppose a random walk was made from a node v and the path from node v to root in the decomposed tree is v \rightarrow anc_{v,1} \rightarrow anc_{v,2} ... root. If anc_{u,i} = anc_{v,j}, we would have a repititive adding at nodes anc_{u,i+1}, anc_{u,i+2} .. root.
    Hence we subtract the expectations of having been reached through the edge , such that even if we have a repititve adding at anc_{u,i+1}, it gets removed by subtracting from the container for the edge connecting anc_{u,i} \rightarrow anc_{u,i+1}. Same follows for anc_{u,i+2} .. root.
  • As one can understand the purpose of the starting\_container[] was to estimate the expected reach value of the nodes in the subtree of the decomposed tree for the starting node. It doesn’t require any subtraction though since it doesn’t run at the risk of repititive adding. We had to maintain it as a separate container since the expectation calculation at each ancestor requires actual probability P if the ancestor was the concerned starting node for random walk and intermediate probability P' otherwise.

The initial preprocessing for LCA is done through a sparse table which took O(N log N) time so that we could answer the queries for LCA and related in O(1) later.
We traverse through each of the ancestors of a node in both the queries, which is O(log N) and at each node we perform operations that cost O(log N) again. Thus the overall complexity is O(N log N + Q log^2 N).

SOLUTIONS:

Setter's Solution
#include<bits/stdc++.h>
using namespace std;

const int M = (1<<20)+5;
const int L = 21;
const int md = 1e9+7;
 
int n,q;
 
int pwr(int a,int n,int m)
{
	int p = 1;
	while(n)
	{
		if(n&1) p = 1ll*p*a%m;
		a = 1ll*a*a%m;
		n >>= 1;
	}
	return p;
}

vector<int> adj[M];
int logbase2[M],inv[M];
int LOG;
/*----------- Pre-Processing ------------*/

int timer,euler_index;
int level_sparse_table[M][L],ancs_sparse_table[M][L];
int start_time[M],lv[M],deg[M],parent[M];
int prefix_prod[M],inv_prefix_prod[M];

void preprocess_dfs(int u,int par,int lev)
{
	start_time[u] = timer++;
	lv[u] = lev;
	deg[u] = (int)adj[u].size();
	parent[u] = par;

	if(u == 1) prefix_prod[u] = 1; /* for root handle */
	else prefix_prod[u] = 1ll*prefix_prod[par]*inv[deg[u]-1]%md;
	inv_prefix_prod[u] = pwr(prefix_prod[u],md-2,md);

	level_sparse_table[euler_index][0] = lev;
	ancs_sparse_table[euler_index][0] = u;
	euler_index++;

	for(auto v:adj[u])
	{
		if(v == par) continue;
		preprocess_dfs(v, u, lev+1);
		level_sparse_table[euler_index][0] = lev;
		ancs_sparse_table[euler_index][0] = u;
		euler_index++;
	}
	timer++; 
}

int get_lca(int u,int v)
{
	int l = min(start_time[u],start_time[v]);
	int r = max(start_time[u],start_time[v]);
	int p = logbase2[r-l+1];
	if(level_sparse_table[l][p] <= level_sparse_table[r-(1<<p)+1][p])
		return ancs_sparse_table[l][p];
	else
		return ancs_sparse_table[r-(1<<p)+1][p];
}

int get_dist(int u,int v)
{
	return lv[u]+lv[v]-2*lv[get_lca(u,v)];
}

int get_prob(int u,int v)
{
	if(u == v) return 1;
	int lca = get_lca(u,v);
	int up = 1ll*prefix_prod[parent[u]]*inv_prefix_prod[lca]%md;
	up = 1ll*up*inv[deg[u]]%md;
	int down = 1ll*prefix_prod[parent[v]]*inv_prefix_prod[lca]%md;
	int turn = 1ll*down*inv[deg[lca]-1]%md;
	down = 1ll*down*inv[deg[lca]]%md;
	
	if(lca == v) /* if v is an ancestor of u */
		return up;
	else if(lca == u) /* if u is an ancestor of v */
		return down;
	else
		return 1ll*up*turn%md;
}

int get_mid_prob(int u,int v)
{
	if(u == v) return 1;
	int lca = get_lca(u,v);
	int up = 1ll*prefix_prod[u]*inv_prefix_prod[lca]%md;
	int down = 1ll*prefix_prod[parent[v]]*inv_prefix_prod[lca]%md;
	down = 1ll*down*inv[deg[lca]-1]%md;

	if(lca == v) /* if v is an ancestor of u */
		return up;
	else if(lca == u) /* if u is an ancestor of v */
		return down;
	else
		return 1ll*up*down%md;
}

void preprocess_lca()
{
	for(int j=1;(1<<j)<=euler_index;++j)
	{
		for(int i=0;(i+(1<<j)-1)<euler_index;++i)
		{
			if(level_sparse_table[i][j-1] < level_sparse_table[i+(1<<(j-1))][j-1])
			{
				level_sparse_table[i][j] = level_sparse_table[i][j-1];
				ancs_sparse_table[i][j] = ancs_sparse_table[i][j-1];
			}
			else
			{
				level_sparse_table[i][j] = level_sparse_table[i+(1<<(j-1))][j-1];
				ancs_sparse_table[i][j] = ancs_sparse_table[i+(1<<(j-1))][j-1];
			}
		}
	}
}

/*-----------------Decomposition Part--------------------------*/

int sz[M],ctree_par[M];

int estimate_subtree_sizes(int u,int par)
{
	sz[u] = 1;
	for(auto v:adj[u])
		if(v != par) sz[u] += estimate_subtree_sizes(v,u);
	return sz[u];
}

int get_centroid(int u,int par,int frag_n)
{
	for(auto v:adj[u])
		if(v != par && sz[v]>frag_n/2) return get_centroid(v,u,frag_n);
	return u;
}

void decompose(int u,int par)
{
	int frag_n = estimate_subtree_sizes(u,par);
	int centroid = get_centroid(u,par,frag_n);
	ctree_par[centroid] = par;
	for(auto v:adj[centroid])
	{
		for(int i=0;i<(int)adj[v].size();++i)
		{
			if(adj[v][i] == centroid)
				adj[v].erase(adj[v].begin()+i);
		}
		decompose(v,centroid);
	}
}

/*-----------------Trie--------------------------*/

typedef struct node{
	node* nxt[2];
	int sum;
}node;

node* newnode()
{
	node* tmp = new node;
	tmp->nxt[0] = tmp->nxt[1] = NULL;
	tmp->sum = 0;
	return tmp;
}

node* insert(node* root,int a,int b)
{
	node* cur = root;
	for(int i=LOG;i>=0;i--)
	{
		int bit = (a>>i)&1;
		if(cur->nxt[bit] == NULL)
		{
			node* tmp = newnode();
			cur->nxt[bit] = tmp;
		}
		cur = cur->nxt[bit];
		cur->sum = (cur->sum+b)%md;
	}
	return root;
}

int get_sum(node* root,int a)
{
	node* cur = root;
	int ans = 0;
	for(int i=LOG;i>=0;i--)
	{
		int bit = (a>>i)&1;
		if(bit == 0 && cur->nxt[1] != NULL)
			ans = (ans+cur->nxt[1]->sum)%md;
		if(cur->nxt[bit] == NULL)
			return ans;
		else
			cur = cur->nxt[bit];
	}
	ans = (ans+cur->sum)%md;
	return ans;
}

int main()
{
	ios::sync_with_stdio(false);
	cin.tie(0);
	cout.tie(0);

	cin >> n >> q;

	for(int i=1;i<n;++i)
	{
		int u,v;
		cin >> u >> v;
		adj[u].push_back(v);
		adj[v].push_back(u);
	}

	int next_pwr = 1;
	int logval = 0;
	for(int i=1;i<=2*n;++i)
	{
		if(i == next_pwr)
		{
			logbase2[i] = logval;
			next_pwr <<= 1;
			++logval;
		}
		else
			logbase2[i] = logbase2[i-1];
	}
	for(int i=1;i<=n-1;++i)
		inv[i] = pwr(i,md-2,md);

	LOG = logbase2[2*n];
	preprocess_dfs(1,0,0);
	preprocess_lca();
	decompose(1,0);

	vector<node*> add(n+1);
	vector<node*> sub(n+1);
	vector<node*> add_src(n+1);

	for(int i=1;i<=n;++i)
	{
		add[i] = newnode();
		sub[i] = newnode();
		add_src[i] = newnode();
	}

	int last_ans = 0;
	while(q--)
	{
		int typ,u,d;
		cin >> typ;
		if(typ == 1)
		{
			cin >> u >> d;
			u = (u+last_ans)%n + 1;
			d = (d+last_ans)%n + 1;
			int v = u;
			add_src[u] = insert(add_src[u],d,1);
			int last = u;
			int cur = ctree_par[u];
			int dist = get_dist(cur,u);
			while(cur != 0)
			{
				if(dist<=d)
				{
					add[cur] = insert(add[cur],d-dist,get_prob(u,cur));
					sub[last] = insert(sub[last],d-dist,get_prob(u,cur));
				}
				last = cur;
				cur = ctree_par[cur];
				dist = get_dist(cur,u);
			}
		}
		else
		{
			cin >> u;
			u = (u+last_ans)%n + 1;
			int ans = get_sum(add[u],0);
			ans = (ans+get_sum(add_src[u],0))%md;
			int last = u;
			int cur = ctree_par[u];
			int dist = get_dist(cur,u);
			while(cur != 0)
			{
				ans = (ans+1ll*get_sum(add[cur],dist)*get_mid_prob(cur,u)%md)%md;
				ans = (ans+md-1ll*get_sum(sub[last],dist)*get_mid_prob(cur,u)%md)%md;
				ans = (ans+1ll*get_sum(add_src[cur],dist)*get_prob(cur,u)%md)%md;
				last = cur;
				cur = ctree_par[cur];
				dist = get_dist(cur,u);
			}
			cout << ans << "\n";
			last_ans = ans;
		}
	}
	return 0;
}
Tester's Solution
5 Likes

Nice subtasks! I used a segment tree and got 35 points. Didn’t even think of LCA :"(

2 Likes

Right I did the same , seeing the linear tree, and only 2 type of probability possible namely 1 or 1/2, segment tree struck in my mind with query() returning pair of count of 1s and 1/2s

1 Like

More such types of subtasks in hard problems are appreciated, where you can apply a complete different approach to get partial points