DIFVAL - Editorial


Contest Link

Author: Sofiia
Tester: Felipe Mota
Editorialist: Rajarshi Basu




Persistent Segment Trees, Small-to-Large-Merging


Danya has a rooted tree with N \leq 10^5 nodes (numbered 1 through N). Node 1 is the root and for each i (2 \le i \le N), the parent of the i-th node is p_i. For each valid i, the i-th node also has a value a_i.

Danya really loves his tree and wants to play with it. You should answer his Q queries. In each query:

  • You are given two integers X and D.
  • Consider the set S of all nodes v that lie in the subtree of node X (including X) such that the distance between X and v does not exceed D.
  • The answer to this query is the number of different values a_v among all v \in S.

Note that there are 5 such test-cases, and that the queries are online.


I promise this will be quick!

The solution hinges on the fact that if we can maintain for every subtree, the depth of the highest node of every colour inside its subtree, we can just count using a persistent segment tree how many of them are less than D. To merge these segment trees fast, traverse on the smaller segment tree and merge. Note that this does not handle when the depth of a color gets updated. Rather, if there were two different depths of the same colour in the two segment trees, we would have accounted for two different depths for it. To correct this, we also need to delete the occurrence of that colour when it becomes “deactivated”. To find the first ancestor where a particular node (and thereby the depth of the colour associated with that node) we can use dfs start and end time. The fact that there will be only O(n) such removals necessary helps keep the complexity O(nlogn).


Observation 1

First off, one might be tempted to think “But oh! we are dealing with subtrees so why not Euler Tour, DFS array, Etc Etc”. The catch is the D variable. And that makes everything related to subtree-being-a-contiguous-range-in-array a tad hard to do.


One idea you might have is, can’t we just add the distance variable like another “dimension” in our search parameter, and then find for distinct values in a range using Segment trees? (more detail about the technique here). Well, maybe. But its really hard to do. If you can figure out a fast enough way to do so, do mention in the comments.

Observation 2

In general, a good idea is to simplify our problem and try to solve the problem in that simple circumstance. What can be a better simplification here than to do away with D. Note that we don’t want to again go back to thinking about subtree-ranges transforms, since we already know its somewhat hard to do. Nonetheless… what can we do?

A possible solution after Observation 2

Well, since there is no D, and we are counting number of distinct colors in a subtree, a popular technique for this is called “Small to large” merging. (Click here and here to learn more).

Offline or online?

Well, it is possible to store the answer for all vertices and answer queries online, but that feels like cheating. Instead, can we make it actually online?

Can you?

Turns out, you can answer queries online. Just make whatever data structure you are using to store the “set” of all colors persistent.
In particular, when you are doing the dfs and merging subtrees, while inserting elements into the “set” of the heavy child, you get back a new version which you then assign to the current node. In this way, you still maintain the “live” version of the data structure at each node, and can truly answer queries online! Note that “set” doesn’t necessarily mean STL Set, but rather a set-like data structure of your choice.

Implementation 1

Whenever we are using persistence, it is a good idea to use a segment tree as the underlying data structure. Hence, in the above case, we maintain the set in the form of a Segment Tree. I will leave the details of on what array we are building the segment tree as an exercise :stuck_out_tongue:

Implementation 2

A question might be, won’t segment trees take a lot of space? For example, if we built a segment tree on the identity array for color, (where if we insert a node of color C, we just do Arr[C]++), wont we need to store a segment tree of size 10^5 at each node? Won’t that be taking us huge amounts of memory? Oh… persistance… But wait! We are using persistence to only merge segment trees. What if it was a star graph, ie, node 1 connected to all nodes 2,3,4 \dots N. Wouldn’t that take O(N^2) memory? Well, yes.

HOWEVER, we can reduce the memory usage effectively. The trick is to use dynamic segment trees.

Observation 3

Let the depth of u be X. Then any node in subtree of u with depth at most X+D is valid. Seems rather \dots uniform right?

Observation 4

Whenever we are counting something, it is a good idea to “Fix” a certain property of an element, using which we count it. What can be such a “property” of the colours/node containing the colors, that we can utilise to keep track of the depth/height of the colors, and whether it is inside D range of the given node u.

Using the above observations

We can just store the least depth occurence of every color! If we maintain only those numbers, then it’s just finding which all nodes have depth < Y for some Y when querying for a subtree. (Don’t forget that whatever we maintain, we will use persistent data structures so that we can do online queries).

How can we maintain?

It seems appropriate to use a segment tree on the frequency table on depths of nodes. Right? RIGHT?

Okay I'll explain

Here’s what I mean. Let the underlying array of the segment tree be called Arr. Let’s say we insert the node u which has color c. Then the operation that we want to support is Arr[depth[u]]++. Notice that we aren’t directly using the value c. However, it is needed because we want to ensure that only 1 occurance of a node with color c exists, and so that the depth[u] for that node u is the minimum possible (since we want to maintain the highest occrurence of a color in a subtree) Hence, when merging two subtrees, let’s say in the data structure of the heavy child, the highest node of color c was v, and in the light child the best occurence is u, and depth[v] > depth[u]. Then, we do the following:

  • Arr[depth[v]]--
  • Arr[depth[u]]++

To answer for a query node u, we go to its version of the segment tree, and just get range sum [depth[u], depth[u]+D]. (Think why)

Initial Attempt at Complexity Analysis

The complexity analysis is simple for this. We use the idea from small to large, and since inserting an element into a segment tree costs O(logn), we get a total complexity of O(nlog^2n + qlogn).

Making merging faster

A faster way to merge, instead of just putting elements one by one into the segment tree, is by traversing both the segment tree and merging. Claim is that this will make merging the two segment trees fast! (Proof given below). The only catch, while merging, we just merge the nodes – we don’t make any updates. To explain what I mean here: Let’s say we are merging two Segment trees T_1 and T_2. Let there exist u \in T_1 and v \in T_2, such that the color of both u,v is c. Then, after merging, our segment tree will contain both the occurrences. How do we delete that extra occurrence like in our previous implementation? ITS HARD right? I mean, now you don’t keep track of which color contributed to which depth – you just keep track of the depths in the segment tree. Maybe you can keep track of the nodes and their colors separately as well… but can you? Won’t that ruin the complexity, since just to maintain that information by small-to-large, we need O(nlog^2n) time?

How to handle repeated occurrence

If we can get when is the first ancestor when a node will need to get “removed” (rather which is the first ancestor where is it will get replaced by another occurrence of the same color, which was in a different subtree, with lesser depth). Formally, for every node u of color c we need to find first ancestor x, such that depth[x] is greatest possible, and there exists a node v of color c such that depth[u]>depth[v] and LCA(u,v) = x. Then, after we merge the segment trees at node x, then for all such node $u$s, we update the segment tree as Arr[depth[u]] - = 1. This will help keep everything correct. (Think why this is equivalent to our previous merging one by one. Think about how we are doing the same additions and subtractions just in a roundabout way). Since there are only O(n) such deletions required, we incur a total time of O(nlogn) for the mergings.

How to find these ancestors?
Hint 1:

Process each color one by one.

Hint 2:

Use depths and euler tour.

Hint 3:

Only LCA of the nodes matter.

Hint 4:

LCA of adjacent nodes are all that matter.


For every color, sort according to increasing depth. Now let’s say you are processing the x^{th} element, and you have already inserted everything with lesser depth into a Data Structure. Now, let’s say that inside this data structure, everything is sorted by DFS startTime of the nodes. Then, we just need to check for LCA with the previous and next nodes for x. You can just use a set for the data structure as sets provide iterators to move to next and previous elements easily.

Merging two Segment Trees

Now, lets backtrack a bit and focus on the merging of segment trees part, and how to do it. Well, what we essentially do is traverse both the segment trees simultaneously. Now, we do this as long as there are corresponding nodes in both the trees. Let’s say the trees be T_1 and T_2, and we are currently at node X. Let’s say the left child of X was null in T_1, and some node Y in T_2. Then after merging, we just attach Y as the left child of X in the updated version, instead of going further down. However, if both left children are not null, then we recursively merge them, and add them as the left child. Note that, this ensures that we will visit at most the nodes of the smaller segment tree, and probably less.

But what is the complexity?

First, notice that every time we visit a node during merging, we create a new node due to persistence. Hence, if we can put an upper bound on the number of nodes created, we can get a complexity. Again, remember that when we call our merge function with trees T_1 and T_2, we only traverse those nodes which are present in both trees. For a moment, let’s forget about persistence. Then, you basically traverse two nodes and “merge” them into one. Effectively what happens is, if T_2 was the segment tree of a light child, you won’t visit the nodes of T_2 again, as only the tree of the heavy child is pushed upwards. Hence, since for every two corresponding nodes that you visit in T_1 and T_2, you “discard” one node. Also note that unless both the corresponding nodes exist, you don’t visit either – you just attach the child pointers to the merged node. How many total nodes will be created and visited in this scenario? (Note that we aren’t considering persistence here). It’s O(NlogN) since we are essentially inserting N numbers overall and in dynamic segment tree, that corresponds to O(NlogN) memory. Hence, we can essentially discard only O(NlogN) nodes, which means that the total number of visited node can be O(2NlogN) which is O(NlogN). Now, let’s get back to persistence. Won’t that be creating many new nodes? No. Note that persistence only creates a new node when we visit a node. We just saw how we only ever visit O(NlogN) nodes. Hence, we will only be creating O(NlogN) extra nodes due to persistence. Hence, overall our complexity becomes O(NlogN).

[NOTE: : I have first proved the bound without using persistence, and then elaborated on what changes when we use persistence.]

Potential Functions

Note that I have explained in an informal language, what could have been written concisely using Potential Functions. Consider the potential function P = sum of sizes of the Segment Trees. We will not be considering persistence. When we do an insert operation, we add logn to P. Hence, max value of P can be NlogN. Whenever we visit two corresponding nodes in T_1 and T_2, we subtract 1 from P. Hence, we only ever visit 2NlogN nodes, worst case. Since in persistence we build additional nodes for every visited node, we have O(NlogN) extra nodes. Hence, our complexity is bounded by O(NlogN).

For more discussion on merging, and related problems, click here.

Final Complexity

So what is the final complexity? The merging of segment trees take O(NlogN) time, and we need to do an additional O(N) deletions, so even that takes O(NlogN) time. Hence, the total complexity is O(NlogN).


Setter's Code
using namespace std;
const int N = 4e5 + 11;
vector<int> v[N],q[N];
vector<pair<int,int> > s[N];
int pr[N],pr2[N];
int color[N],nx[N];
int tin[N],tout[N],deep[N],time_;
int d[N][18];
bool u[N];
int root[N];
int a[N*40],l1[N*40],r1[N*40];
int kk;
void dfs(int l)
    for (int j=1; j<=17; j++)
    for (int j=0; j<v[l].size(); j++)
bool cmp(int x, int y)
    return (tin[x]<tin[y]);
bool pred(int x, int y)
    return (tin[x]<=tin[y]&&tout[y]<=tout[x]);
int lca(int x, int y)
    if (pred(x,y)) return x;
    if (pred(y,x)) return y;
    for (int j=17; j>=0; j--)
        if (!pred(d[x][j],y)) x=d[x][j];
    return d[x][0];
void build(int i, int l, int r)
    if (l==r) {a[i]=0; return;}
    int mid=(l+r)>>1;
    kk++; l1[i]=kk; build(l1[i],l,mid);
    kk++; r1[i]=kk; build(r1[i],mid+1,r);
void update(int i1, int i2, int l, int r, int x, int t)
    while (1)
        int mid=(l+r)>>1;
        if (l==r) break;
        if (x<=mid)
            kk++; l1[i2]=kk;
        } else
            kk++; r1[i2]=kk;
int get(int i, int l, int r, int tl, int tr)
    if (tl<=l&&r<=tr) return a[i];
    if (tl>r||l>tr) return 0;
    int mid=(l+r)>>1;
    return get(l1[i],l,mid,tl,tr)+get(r1[i],mid+1,r,tl,tr);
void up()
    int n;
    for (int i=0; i<=n; i++)
    for (int i=2; i<=n; i++)
    for (int i=1; i<=n; i++)
    for (int col=1; col<=n; col++)
        if (q[col].size())
        vector<int> c;
        for (int j=1; j<q[col].size(); j++)
            int x=q[col][j];
            int y=q[col][j-1];
            int z=lca(x,y);
        for (int j=1; j<c.size(); j++)
            if (c[j]!=c[j-1])
            int x=c[j];
            int y=c[j-1];
            while (!pred(y,x))
        for (int j=c.size()-1; j>=1; j--)
            if (j==(int)c.size()-1||c[j]!=c[j+1])
            int x=c[j];
            if (u[x]) nx[x]=x;
            if (nx[pr2[x]]==0||deep[nx[pr2[x]]]>deep[nx[x]])
        for (int j=0; j<q[col].size(); j++)
    for (int i=1; i<=n; i++)
        int last=root[i-1];
        for (int j=0; j<s[i].size(); j++)
            int l=s[i][j].first,r=s[i][j].second;
            int now=kk;
            if (l!=0)
                if (l!=0) update(last,now,1,n,tin[l],-1);
    int m;
    int ans=0;
    for (int i=1; i<=m; i++)
        int x,d;
int main()
    ios_base::sync_with_stdio(0); cin.tie(0); cout.tie(0);
    int t;
    while (t--)
Tester's Code
#include <bits/stdc++.h>
using namespace std;
struct forest {
    vector<pair<int,int>> edges;
    vector<vector<int>> to, lg_parents;
    vector<int> sub, color, parent, depth, pre, ord, in, out;
    int comps, n, lgn, C;
    forest(int n): n(n) {
        sub.assign(n, 0); 
        color.assign(n, 0); 
        depth.assign(n, 0);
    void add_edge(int u, int v){
        int id = edges.size();
        assert(id < n - 1);
        edges.push_back(make_pair(u, v));
    inline int adj(int u, int id){ return u ^ edges[id].first ^ edges[id].second; }
    void dfs(int u, int p){
        in[u] = C++;
        color[u] = comps;
        parent[u] = p;
        sub[u] = 1;
        for(int id : to[u]){
            int v = adj(u, id);
            if(v == p) continue;
            depth[v] = depth[u] + 1;
            dfs(v, u);
            sub[u] += sub[v];
        out[u] = C++;
    bool is_ancestor(int u, int v){
        return in[u] <= in[v] && out[v] <= out[u];
    void dfs_all(){
        comps = 0;
        C = 0;
        for(int i = 0; i < n; i++){
                dfs(i, -1);
    void build_parents(){
        lgn = 0;
        while((1<<lgn) <= n) lgn++;
        lg_parents.assign(lgn, vector<int>(n, -1));
        for(int i = 0; i < n; i++)
            lg_parents[0][i] = parent[i];
        for(int i = 1; i < lgn; i++){
            for(int j = 0; j < n; j++){
                if(~lg_parents[i - 1][j]){
                    lg_parents[i][j] = lg_parents[i - 1][lg_parents[i - 1][j]];
    int jump(int u, int k){
        for(int i = lgn - 1; i >= 0; i--) if(k&(1<<i)) u = lg_parents[i][u];
        return u;
    int lca(int u, int v){
        if(depth[u] < depth[v]) swap(u, v);
        for(int i = lgn - 1; i >= 0; i--)
            if((depth[u] - depth[v])&(1<<i))
                u = lg_parents[i][u];
        if(u == v)
            return u;
        for(int i = lgn - 1; i >= 0; i--)
            if(lg_parents[i][u] != lg_parents[i][v]){
                u = lg_parents[i][u];
                v = lg_parents[i][v];
        return lg_parents[0][u];
    int dist(int u, int v){
        return depth[u] + depth[v] - 2 * depth[lca(u, v)];
namespace hist {
    const int node_pool = (400001) * 120, L = 0, R = 400001;
    int sum[node_pool], _l[node_pool], _r[node_pool], at = node_pool;
    int _get_node(int r){
        assert(at > 0);
        sum[at] = sum[r];
        _l[at] = _l[r];
        _r[at] = _r[r];
        return at;
    int add(int rt, int l, int r, int p, int v){
        rt = _get_node(rt);
        if(l == r) sum[rt] += v;
        else {
            int m = (l + r)>>1;
            if(p <= m) _l[rt] = add(_l[rt], l, m, p, v);
            else _r[rt] = add(_r[rt], m + 1, r, p, v);
            sum[rt] = sum[_l[rt]] + sum[_r[rt]];
        return rt;
    int add(int rt, int p, int v){
        return add(rt, L, R, p, v);
    int get(int rt, int l, int r, int x, int y){
        if(rt == 0) return 0;
        if(l > y || r < x) return 0;
        if(x <= l && r <= y) return sum[rt];
        int m = (l + r)>>1;
        return get(_l[rt], l, m, x, y) + get(_r[rt], m + 1, r, x, y);
    int get(int rt, int x, int y){
        return get(rt, L, R, x, y);
int main(){
    int t;
    scanf("%d", &t);
        hist::at = hist::node_pool;
        int n;
        scanf("%d", &n);
        vector<int> a(n);
        forest fo(n);
        for(int i = 1; i < n; i++){
            int p; scanf("%d", &p); p--;
            fo.add_edge(p, i);
        vector<vector<int>> group(n + 1);
        for(int i = 0; i < n; i++) {
            scanf("%d", &a[i]);
        vector<int> parent(n);
        vector<pair<int,int>> best(n);
        vector<vector<pair<int,int>>> operations(n);
        for(int i = 0; i <= n; i++){
                sort(group[i].begin(), group[i].end(), [&](int u, int v){ return fo.in[u] < fo.in[v]; });
                int len = group[i].size();
                for(int j = 0; j + 1 < len; j++) 
                    group[i].push_back(fo.lca(group[i][j], group[i][j + 1]));
                sort(group[i].begin(), group[i].end(), [&](int u, int v){ return fo.in[u] < fo.in[v]; });
                group[i].erase(unique(group[i].begin(), group[i].end()), group[i].end());
                for(int node : group[i]){
                    parent[node] = -1;
                    best[node] = {1<<30, 1<<30};
                    if(a[node] == i)
                        best[node] = {fo.depth[node], node};
                len = group[i].size();
                int last = 0;
                for(int j = 1; j < len; j++){
                    int node = group[i][j];
                    while(last != 0 && !fo.is_ancestor(last, node)) last = parent[last];
                    parent[node] = last;
                    last = node;
                len = group[i].size();
                for(int j = len - 1; j >= 0; j--){
                    int u = group[i][j];
                    int p = parent[u];
                    operations[best[u].first].push_back({fo.in[u], 1});
                    if(p != -1){
                        best[p] = min(best[p], best[u]);
                        operations[best[u].first].push_back({fo.out[fo.jump(u, fo.depth[u] - fo.depth[p] - 1)], -1});
        vector<int> pers(n);
        for(int i = 0; i < n; i++){
            if(i) pers[i] = pers[i - 1];
            for(auto op : operations[i])
                pers[i] = hist::add(pers[i], op.first, op.second);
        int q;
        scanf("%d", &q);
        int ans = 0;
            int X, D;
            scanf("%d %d", &X, &D);
            int x = X ^ ans;
            int y = D ^ ans;
            printf("%d\n", (ans = hist::get(pers[min(n - 1, fo.depth[x] + y)], fo.in[x], fo.out[x] - 1)));
    return 0;

Please give me suggestions if anything is unclear so that I can improve. Thanks :slight_smile:


Is O(Nlog^2N) not intended to pass?

Your editorials seem funny and complete to me, but you asked for suggestions, so I give you one for your consideration.

It’s nice to have some insets (name?), motivate some explanations with little thoughts, observations, questions and comments about suboptimal approaches, but the editorial loses clarity if it’s full of those things. I would prefer a full “linear” explanation of the approach included somewhere in the editorial. By “linear” I mean without divergent paths of thought that could be included at the end (or the beginning) or within the insets.

That is just my suggestion. It is entirely possible that I’m the only one who finds a traditional explanation more useful.


Same question.

I tried to optimize my solution a lot but failed to get it through and instead of thinking of a O(N log N) solution, I was cursing the author all the time for putting such a strict time limit. FML!

What are online queries and offline queries ?

offline queries: you know all the queries at once
online queries: you need to answer each query to move to the next one

In this problem, the XOR operation needed to get the parameters of each query makes you to know the answer of a query to know the parameters of the next one

1 Like

I first had the idea explained here, but I could not get it to pass within the time limit, even with O3 optimization. Maybe I’m very bad at constant optimization :(, I even read input character by character to reduce IO time. Also I find it hard to believe that an \mathcal{O}(n \lg^2 n + q \lg n) solution passes, especially since there are 5 test cases with n = 2 . 10^5 each (I checked with assert).

My \mathcal{O}((n + q)\lg n) solution :

Firstly re-number the vertices in dfs order, now every sub-tree is in a continuous range. Specifically sub-tree at X contains nodes (re-numbered) [l, r] where l is the dfs “intime” (number of nodes visited when you enter dfs at node X) and r is dfs “extime” (number of nodes visited by dfs when you leave X for the last time).

The second crucial observation is that given a query X, D, (say depth of X is d), we need to consider all nodes whose depths are between d and d + D and inside the range [l, r] that sub-tree at X represents. We should ignore nodes having depth greater than d+D, however we can leave nodes with depth less than d since they won’t lie in [l, r]. So we shall consider the tree to be empty initially and update node values in bfs order. Once we have inserted all nodes until depth d+D, we are ready for our query. To answer queries online, use persistent structures.

It is really hard to answer "number of distinct values in range [l, r]" in general. But in this case we can use some tricks. Say we have value v at node X. We need to consider v only for the queries concerning the ancestors of v that do not have any other node with value v already inserted before. This ancestor is the one among lca(v_{pred}, v) and lca(v, v_{succ}) with greater depth (where v_{pred} and v_{succ} are predecessor and successor to v in dfs order). We will add 1 to node v and subtract 1 from the ancestor. Now sub-tree sum equals number of distinct values.

We maintain only the versions of the tree after inserting all nodes at depth d for each d.
To answer the query lookup version \min\{d+D, d_{max}\} and calculate range [l, r] sum for that tree.



I just wanted to know if O(q.log^2n) solution would pass for subtask 2 where the tree is a bamboo. I don’t know why my solution is timing out for subtask 2


I struggled to implement this same approach for two days, but I couldn’t get over the idea that it might happen that with every update you need to rebuild the entire persistent tree. What if at each depth d you need to upgrade a large tin node and a low tin node?

The problem was mainly in your Merge function. Also you should avoid to flush the output each time. Check your modified solution and see if you can find the changes :wink:

1 Like

Hey thank you so much I just saw the changes (:

did you know that the std library has a merge function? If you have to merge to sorted vectors v1 and v2 and output that in vector v just write this

vector<int> v(v1.size()+v2.size());

Nop :stuck_out_tongue:

Thanks for the suggestion. From next time, I’ll include the straightforward version as well at the beginning.
I just write it in this format because often I have felt that the “how to think/motivation” part is as important as the solution. But what you said also makes sense, since experienced coders often might not want all those details :stuck_out_tongue:.


Hi, I went for subtask-2 in this problem. I tried to do it with a segment tree of sets but it gave TLE ( https://www.codechef.com/viewsolution/34245220 )
I also tried with a segment tree of unordered-set but it also gave TLE (https://www.codechef.com/viewsolution/34245514)
Please tell me why it is better to use a segment tree of vectors than sets.
@rajarshi_basu @rahulmysuru7 Can you please have a look at my solution.

Its too long to read,i guess point to point ,direct solution(traditional approach) would make it better.
Moreover we are going to have video editorials ,there we can discuss our first thought process and shortcomings.

I had googled this but couldn’t find it, Thanks for sharing :slight_smile:

1 Like

Yep, I realised this might be boring for experienced coders. I will include a more direct solution from next time as well.

in online queries, the queries are known all at once
on the other hand, in offline queries, they are stored in buffer area or something (not read by online judge/grader) and each of them has to be answered to go to the next step

i read about this in geeksforgeeks site sometime back, not too sure

Hello, can anyone please provide the links for where to read the topic small to large merging as mentioned in the prerequisites?