A_S_P - Editorial


Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4

Author: Kashyap Panchani
Testers: Nishank Suresh, Tejas Pandey
Editorialist: Nishank Suresh




Euler tour of a tree, segment trees with lazy propagation, binary lifting


Given a tree on N vertices each with a color assigned to it, support Q queries on it, each being one of the following:

  • Given u, change the root to u
  • Given u and c, set every vertex in the subtree of u to have color c
  • Given u and c, count the number of vertices in the subtree of u that have color c


First, let’s ignore the root-change updates and try to just solve the other two queries for a fixed root.

Dealing with color changes and queries

For now let’s assume the root of the tree is 1.

A useful technique when dealing with updates and queries on subtrees is the Euler tour of a tree.
This allows us to convert subtree updates/queries into subarray updates/queries instead, which are much easier to deal with.

So, build the Euler tour of the given tree. Let B denote the array of colors in Euler tour order. We need to support the following:

  • Given c and a range [L, R], set B_i := c for every L \leq i \leq R
  • Given c and a range [L, R], count the number of L \leq i \leq R such that B_i = c

There are a few different ways to handle this. One is to use a segment tree with lazy propagation.


Build a segment tree on B.
Each node of the segment tree stores a frequency map of all the elements in the subarray it represents. Let freq[node] denote this map for a given node.

With this, querying a range for the count of c is trivial: it’s simply the sum of freq[node][c] across all the \mathcal{O}(\log N) segtree nodes that cover the range [L, R], so each query can be answered in \mathcal{O}(\log^2 N) or such.

Handling updates can be done with the help of lazy propagation.

Suppose we want to set the range [L, R] to c.

Let [l, r] be one range of the segment tree that fully lies within [L, R], corresponding to the node node.
Then, do the following:

  • For each x \in freq[node] and each ancestor a of node, perform freq[a][x] -= freq[node][x]
  • Set freq[node] to contain the single element c with a frequency of r-l+1, and also increase freq[a][c] by r-l+1 for every ancestor
  • Set a flag at this node to lazily propagate this information to its children when it is needed.

While this might make a single update take a lot of time, the total number of operations is bounded by \mathcal{O}((N+Q)\log ^2 N) map operations across all updates.


Each element of some map affects only its \mathcal{O}(\log N) ancestors, and whenever we happen to process it, it is completely deleted.

Building the segment tree itself creates \mathcal{O}(N\log N) map elements, and then each update creates \mathcal{O}(\log N) more.
So, we create \mathcal{O}((N+Q)\log N) map elements in total, each of which costs \mathcal{O}(\log N) map operations to delete.

Note that depending on your implementation, you might need to use a reasonably fast hashmap to not get TLE: for example, the one described here.

Handling root changes

Changing the root appears to change the structure of the Euler tour completely, and we can’t afford to rebuild it after each change.

However, as it turns out, we don’t need to rebuild the Euler tour!

Let’s root the tree at 1 and compute its Euler tour, as above. Now, consider two vertices r and u.

  • If r does not lie in the subtree of u, then when rooting the tree at r, the subtree of u is the exact same as the current subtree of u.
  • If r does lie in the subtree of u, then when rooting the tree at r, the subtree of u is the entire tree except the subtree of v, where v is the child of u that contains r.
    • Finding v is a simple application of binary lifting.

In terms of the Euler tour we first computed,

  • The first case just corresponds to a single subarray in it
  • The second case corresponds to the entire array except some subarray, which can be thought of as two disjoint subarrays instead

Either way, even if the root is changed we only need to update/query at most two ranges each time, so our time complexity doesn’t change!


\mathcal{O}((N+Q)\log^2 N) or \mathcal{O}((N+Q)\log^3 N) per testcase.


Setter's code (C++)
#define ll long long int
#define loop(i,a,b) for(ll i=a;i<b;++i)
#define rloop(i,a,b) for(ll i=a;i>=b;i--)
#define in(a,n) for(ll i=0;i<n;++i) cin>>a[i];
#define pb push_back
#define mk make_pair
#define all(v) v.begin(),v.end()
#define dis(v) for(auto i:v)cout<<i<<" ";cout<<endl;
#define display(arr,n) for(int i=0; i<n; i++)cout<<arr[i]<<" ";cout<<endl;
#define fast ios_base::sync_with_stdio(false);cin.tie(NULL);cout.tie(NULL);srand(time(NULL));
#define l(a) a.length()
#define fr first
#define sc second
#define mod 1000000007
#define endl '\n'
#define yes cout<<"Yes"<<endl;
#define no cout<<"No"<<endl;
using namespace std;
#define debug(x) cerr << #x<<" "; _print(x); cerr << endl;
void _print(ll t) {cerr << t;}
void _print(int t) {cerr << t;}
void _print(string t) {cerr << t;}
void _print(char t) {cerr << t;}
void _print(double t) {cerr << t;}
template <class T, class V> void _print(pair <T, V> p);
template <class T> void _print(vector <T> v);
template <class T> void _print(set <T> v);
template <class T, class V> void _print(map <T, V> v);
template <class T> void _print(multiset <T> v);
template <class T, class V> void _print(pair <T, V> p) {cerr << "{"; _print(p.fr); cerr << ","; _print(p.sc); cerr << "}";}
template <class T> void _print(vector <T> v) {cerr << "[ "; for (T i : v) {_print(i); cerr << " ";} cerr << "]";}
template <class T> void _print(set <T> v) {cerr << "[ "; for (T i : v) {_print(i); cerr << " ";} cerr << "]";}
template <class T> void _print(multiset <T> v) {cerr << "[ "; for (T i : v) {_print(i); cerr << " ";} cerr << "]";}
template <class T, class V> void _print(map <T, V> v) {cerr << "[ "; for (auto i : v) {_print(i); cerr << " ";} cerr << "]";}

ll add(ll x,ll y)  {ll ans = x+y; return (ans>=mod ? ans - mod : ans);}
ll sub(ll x,ll y)  {ll ans = x-y; return (ans<0 ? ans + mod : ans);}
ll mul(ll x,ll y)  {ll ans = x*y; return (ans>=mod ? ans % mod : ans);}

#define level 20
vector<map<int,int>> t;
vector<map<int,int>> old,newly_added;

void build(vector<int> &a, int v, int tl, int tr) {
    if (tl == tr) {
        map<int,int> mp;    mp[a[tl]]++;
        t[v] = mp;
    } else { 
        int tm = (tl + tr) / 2;
        build(a, v*2, tl, tm);
        build(a, v*2+1, tm+1, tr);
        for(auto i:t[2*v])  t[v][i.fr]+=i.sc;
        for(auto i:t[2*v+1])  t[v][i.fr]+=i.sc;

int query(int v, int tl, int tr, int l, int r, int x) {
    if (l > r)
        return 0;
    if(old[v].size()){ // if it has already been updated previously.
        if(tl!=tr)  old[2*v] = old[2*v+1] = old[v];
        t[v] = old[v];
        t[v].begin()->second = tr-tl+1;
    if (l == tl && r == tr) {
        if(t[v].find(x) == t[v].end())    return 0;
        return t[v][x];
    int tm = (tl + tr) / 2;
    return (query(v*2, tl, tm, l, min(r, tm), x) + query(v*2+1, tm+1, tr, max(l, tm+1), r, x));

void update(int v, int tl, int tr, int l,int r, int new_val) {
    if(l > r)   return;
    // here if in some previous query we updated range from 1 to 4 but now in another query if want to break that 1 to 4 than we have to update its children so for that reason I created old vector in that i will store the previous value.
    if(old[v].size()){  // if it has already been updated previously.
        if(tl!=tr)   old[2*v] = old[2*v+1] = old[v];
        t[v] = old[v];
        t[v].begin()->second = tr-tl+1;
    // if it is in the updation range then I have to remove those color from its parents as well so i store those color in newly_added vector.
    if(l == tl && r == tr){
        newly_added[v] = t[v];
        // for its children I am storing its current value.
        if(tl!=tr)  old[2*v] = old[2*v+1] = t[v];
        int tm = (tl+tr)/2;
        // now if its children have made some changes in his vector then we are storing it in newly_added vector so from that we have to erase those color as well from the node u.
        if(tl!=tr && newly_added[2*v].size()){
                newly_added[v] = newly_added[2*v];
            set<int> zero_count;
            for(auto i:newly_added[2*v])  {
                if(t[v][i.fr] == 0) zero_count.insert(i.fr);
            for(auto i:zero_count)  t[v].erase(i);
        if(tl!=tr && newly_added[2*v+1].size()){
                for(auto i:newly_added[2*v+1])  newly_added[v][i.fr]+=i.sc;
            set<int> zero_count;
            for(auto i:newly_added[2*v+1])  {
                if(t[v][i.fr] == 0) zero_count.insert(i.fr);
            for(auto i:zero_count)  t[v].erase(i);

vector<vector<int>> vec;
vector<vector<int>> parent;
vector<int> col,in,out;
vector<int> v,height;

void precomputeSparseMatrix(int n)
    for (int i=1; i<level; i++)
        for (int node = 1; node <= n; node++)
                parent[node][i] = parent[parent[node][i-1]][i-1];

int binary_lift(ll vl,ll curr_node){
            curr_node = parent[curr_node][i];
    return curr_node;

void dfs(int i,int par,int &time,int h){
    // cerr<<i<<" ";
    in[i] = ++time;
    height[i] = h;
    parent[i][0] = par;
    for(auto j:vec[i])  {
    out[i] = ++time;
    // cerr<<i<<" ";

ll cnt = 0,tot1 = 0,tot2 = 0;

void solve(){
    int n;   cin>>n;
    assert(n>=1 && n<=3e5);
    col.assign(n,0);  in(col,n);
    loop(i,0,n) assert(col[i]>=1 && col[i]<=1e9);
        int a,b; cin>>a>>b;
        assert(a>=1 && a<=n);
        assert(b>=1 && b<=n);
    int time = 0;
    int sz = v.size();
    sz = ceil(1.00*log2(v.size()));
    sz = (1ll<<sz); sz*=2;
    int q;   cin>>q;
    int curr_root = 1;
        int type;    cin>>type;
        // debug(type)
        assert(type >= 1 && type <= 3);
        if(type == 1){
            int x,col;   cin>>x>>col;
            assert(x>=1 && x<=n);
            assert(col>=1 && col<=1e9);
            if(curr_root != 1 && (in[x] <= in[curr_root] && out[x] >= out[curr_root])){
                int diff = height[curr_root]-height[x];
                int node = (diff ? binary_lift(diff-1,curr_root) : curr_root);
                if(diff == 0){
                    int l = 1,r = v.size();
                    int l_1 = 1,r_1 = in[node]-1,l_2 = out[node]+1,r_2 = v.size();
                int l = in[x],r = out[x];
        else if(type == 2){
            int x;   cin>>x;
            assert(x>=1 && x<=n);
            curr_root = x;
            int x,u; cin>>x>>u;
            assert(x>=1 && x<=n);
            assert(u>=1 && u<=1e9);
            ll ans = 0;
            if(curr_root != 1 && (in[x]<=in[curr_root] && out[x] >= out[curr_root])){
                int diff = height[curr_root]-height[x];
                int node = (diff ? binary_lift(diff-1,curr_root) : curr_root);
                if(diff == 0){
                    int l = 1,r = v.size();
                    int l_1 = 1,r_1 = in[node]-1,l_2 = out[node]+1,r_2 = v.size();
                int l = in[x],r = out[x];

int main()
    int t; cin>>t;
    while(t--) solve();
    return 0;
Editorialist's code (C++)
#include "bits/stdc++.h"
// #pragma GCC optimize("O3,unroll-loops")
// #pragma GCC target("avx2,bmi,bmi2,lzcnt,popcnt")
using namespace std;
using ll = long long int;
mt19937_64 rng(chrono::high_resolution_clock::now().time_since_epoch().count());

#include <bits/extc++.h>
using namespace __gnu_pbds;
struct chash {
	const int RANDOM = (int64_t)(make_unique<char>().get()) ^ chrono::high_resolution_clock::now().time_since_epoch().count();
	static uint64_t hash_f(uint64_t x) {
		x += 0x9e3779b97f4a7c15;
		x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9;
		x = (x ^ (x >> 27)) * 0x94d049bb133111eb;
		return x ^ (x >> 31);
	static unsigned hash_combine(unsigned a, unsigned b) { return a * 31 + b; }
	int operator()(int x) const { return hash_f(x)^RANDOM; }
using hashmap = gp_hash_table<int, int, chash>;

struct Node {
	hashmap freq;
	Node *l = 0, *r = 0, *par = 0;
	int lo, hi;
	int change = false, val = 0;
	Node(int _lo,int _hi):lo(_lo),hi(_hi){}
	int query(int L, int R, int x) {
		if (R <= lo || hi <= L) return 0;
		if (L <= lo && hi <= R) {
			return freq[x];
		return l->query(L, R, x) + r->query(L, R, x);
	void set(int pos, int x) {
		if (lo+1 == hi) return;
		if (pos >= r->lo) r->set(pos, x);
		else l->set(pos, x);
	void rangeset(int L, int R, int x, bool updpar = false) {
		if (R <= lo || hi <= L) return;
		if (L <= lo && hi <= R) {
			change = true;
			val = x;
			if (updpar) {
				auto cur = par;
				while (cur) {
					for (auto &[val, ct] : freq) {
						cur -> freq[val] -= ct;
						if (cur -> freq[val] == 0) (cur -> freq).erase(val);
					cur -> freq[x] += hi - lo;
					cur = cur -> par;
			freq[x] = hi - lo;
		l->rangeset(L, R, x, updpar);
		r->rangeset(L, R, x, updpar);
	void push() {
		if (!l) {
			int mid = lo + (hi - lo)/2;
			l = new Node(lo, mid); r = new Node(mid, hi);
			l -> par = r -> par = this;
		if (change)
			l->rangeset(lo,hi,val), r->rangeset(lo,hi,val), change = false;

int main()
	ios::sync_with_stdio(false); cin.tie(0);

	int t; cin >> t;
	while (t--) {
		int n; cin >> n;
		vector<int> a(n);
		for (int i = 0; i < n; ++i) {
			cin >> a[i];
		vector<vector<int>> g(n);
		for (int i = 0; i < n-1; ++i) {
			int u, v; cin >> u >> v;
		int timer = 0;
		vector<int> in(n), out(n);
		vector<array<int, 18>> anc(n);
		auto dfs = [&] (const auto &self, int u, int p) -> void {
			in[u] = timer++;
			anc[u][0] = p;
			for (int i = 1; i < 18; ++i) anc[u][i] = anc[anc[u][i-1]][i-1];
			for (int v : g[u]) {
				if (v == p) continue;
				self(self, v, u);
			out[u] = timer;
		auto isanc = [&] (int u, int v) {return in[u] <= in[v] and out[u] >= out[v];}; // Is u an ancestor of v?
		auto getchild = [&] (int u, int v) { // Which child of v contains u?
			for (int i = 17; i >= 0; --i) if (!isanc(anc[u][i], v)) u = anc[u][i];
			return u;
		dfs(dfs, 0, 0);

		Node *segtree = new Node(0, n);
		for (int i = 0; i < n; ++i) {
			segtree -> set(in[i], a[i]);
		int q; cin >> q;
		int root = 0;
		while (q--) {
			int type; cin >> type;
			if (type == 1) {
				int u, col; cin >> u >> col; --u;
				if (u == root) segtree -> rangeset(0, n, col, true);
				else if (in[u] <= in[root] and out[u] >= out[root]) {
					int v = getchild(root, u);
					segtree -> rangeset(out[v], n, col, true);
					segtree -> rangeset(0, in[v], col, true);
				else segtree -> rangeset(in[u], out[u], col, true);
			} else if (type == 2) {
				cin >> root; --root;
			} else {
				int u, col; cin >> u >> col; --u;
				int ans = 0;
				if (u == root) ans = segtree -> query(0, n, col);
				else if (in[u] <= in[root] and out[u] >= out[root]) {
					int v = getchild(root, u);
					ans = segtree -> query(out[v], n, col) + segtree -> query(0, in[v], col);
				else ans = segtree -> query(in[u], out[u], col);
				cout << ans << '\n';

In setter’s code, In update function, in the lazy propagation code,
t[v] = old[v];
t[v].begin()->second = tr-tl+1;
What does these two lines mean?
t[v] is made old[v] - I think this is lazy propagation
then what’s t[v].begin() ?

@zxy090909 Can you please explain what are the vectors t, old and newly_added in your code…

So for the first line you guessed it correctly, it is for lazy propagation and now for v we know that if old[v] has size that means that color of all the leaf nodes in segment tree that is below v is the same. Hence it has size equal to 1. So in the first line we assign color from old vector but for the frequency we have to make it as number of leaf nodes in the segment tree that is below v. So we transfer that color in the first line(from the old vector) and set its frequency in the second line(it will be tr-tl+1).

1 Like