PROBLEM LINK:
Author: Sofiia
Tester: Felipe Mota
Editorialist: Rajarshi Basu
DIFFICULTY:
Medium-Hard
PREREQUISITES:
Persistent Segment Trees, Small-to-Large-Merging
PROBLEM:
Danya has a rooted tree with N \leq 10^5 nodes (numbered 1 through N). Node 1 is the root and for each i (2 \le i \le N), the parent of the i-th node is p_i. For each valid i, the i-th node also has a value a_i.
Danya really loves his tree and wants to play with it. You should answer his Q queries. In each query:
- You are given two integers X and D.
- Consider the set S of all nodes v that lie in the subtree of node X (including X) such that the distance between X and v does not exceed D.
- The answer to this query is the number of different values a_v among all v \in S.
Note that there are 5 such test-cases, and that the queries are online.
QUICK EXPLANATION:
I promise this will be quick!
The solution hinges on the fact that if we can maintain for every subtree, the depth of the highest node of every colour inside its subtree, we can just count using a persistent segment tree how many of them are less than D. To merge these segment trees fast, traverse on the smaller segment tree and merge. Note that this does not handle when the depth of a color gets updated. Rather, if there were two different depths of the same colour in the two segment trees, we would have accounted for two different depths for it. To correct this, we also need to delete the occurrence of that colour when it becomes “deactivated”. To find the first ancestor where a particular node (and thereby the depth of the colour associated with that node) we can use dfs start and end time. The fact that there will be only O(n) such removals necessary helps keep the complexity O(nlogn).
EXPLANATION:
Observation 1
First off, one might be tempted to think “But oh! we are dealing with subtrees so why not Euler Tour, DFS array, Etc Etc”. The catch is the D variable. And that makes everything related to subtree-being-a-contiguous-range-in-array a tad hard to do.
But....
One idea you might have is, can’t we just add the distance variable like another “dimension” in our search parameter, and then find for distinct values in a range using Segment trees? (more detail about the technique here). Well, maybe. But its really hard to do. If you can figure out a fast enough way to do so, do mention in the comments.
Observation 2
In general, a good idea is to simplify our problem and try to solve the problem in that simple circumstance. What can be a better simplification here than to do away with D. Note that we don’t want to again go back to thinking about subtree-ranges transforms, since we already know its somewhat hard to do. Nonetheless… what can we do?
A possible solution after Observation 2
Well, since there is no D, and we are counting number of distinct colors in a subtree, a popular technique for this is called “Small to large” merging. (Click here and here to learn more).
Offline or online?
Well, it is possible to store the answer for all vertices and answer queries online, but that feels like cheating. Instead, can we make it actually online?
Can you?
Turns out, you can answer queries online. Just make whatever data structure you are using to store the “set” of all colors persistent.
In particular, when you are doing the dfs and merging subtrees, while inserting elements into the “set” of the heavy child, you get back a new version which you then assign to the current node. In this way, you still maintain the “live” version of the data structure at each node, and can truly answer queries online! Note that “set” doesn’t necessarily mean STL Set, but rather a set-like data structure of your choice.
Implementation 1
Whenever we are using persistence, it is a good idea to use a segment tree as the underlying data structure. Hence, in the above case, we maintain the set in the form of a Segment Tree. I will leave the details of on what array we are building the segment tree as an exercise
Implementation 2
A question might be, won’t segment trees take a lot of space? For example, if we built a segment tree on the identity array for color, (where if we insert a node of color C, we just do Arr[C]++), wont we need to store a segment tree of size 10^5 at each node? Won’t that be taking us huge amounts of memory? Oh… persistance… But wait! We are using persistence to only merge segment trees. What if it was a star graph, ie, node 1 connected to all nodes 2,3,4 \dots N. Wouldn’t that take O(N^2) memory? Well, yes.
HOWEVER, we can reduce the memory usage effectively. The trick is to use dynamic segment trees.
Observation 3
Let the depth of u be X. Then any node in subtree of u with depth at most X+D is valid. Seems rather \dots uniform right?
Observation 4
Whenever we are counting something, it is a good idea to “Fix” a certain property of an element, using which we count it. What can be such a “property” of the colours/node containing the colors, that we can utilise to keep track of the depth/height of the colors, and whether it is inside D range of the given node u.
Using the above observations
We can just store the least depth occurence of every color! If we maintain only those numbers, then it’s just finding which all nodes have depth < Y for some Y when querying for a subtree. (Don’t forget that whatever we maintain, we will use persistent data structures so that we can do online queries).
How can we maintain?
It seems appropriate to use a segment tree on the frequency table on depths of nodes. Right? RIGHT?
Okay I'll explain
Here’s what I mean. Let the underlying array of the segment tree be called Arr. Let’s say we insert the node u which has color c. Then the operation that we want to support is Arr[depth[u]]++. Notice that we aren’t directly using the value c. However, it is needed because we want to ensure that only 1 occurance of a node with color c exists, and so that the depth[u] for that node u is the minimum possible (since we want to maintain the highest occrurence of a color in a subtree) Hence, when merging two subtrees, let’s say in the data structure of the heavy child, the highest node of color c was v, and in the light child the best occurence is u, and depth[v] > depth[u]. Then, we do the following:
- Arr[depth[v]]--
- Arr[depth[u]]++
To answer for a query node u, we go to its version of the segment tree, and just get range sum [depth[u], depth[u]+D]. (Think why)
Initial Attempt at Complexity Analysis
The complexity analysis is simple for this. We use the idea from small to large, and since inserting an element into a segment tree costs O(logn), we get a total complexity of O(nlog^2n + qlogn).
Making merging faster
A faster way to merge, instead of just putting elements one by one into the segment tree, is by traversing both the segment tree and merging. Claim is that this will make merging the two segment trees fast! (Proof given below). The only catch, while merging, we just merge the nodes – we don’t make any updates. To explain what I mean here: Let’s say we are merging two Segment trees T_1 and T_2. Let there exist u \in T_1 and v \in T_2, such that the color of both u,v is c. Then, after merging, our segment tree will contain both the occurrences. How do we delete that extra occurrence like in our previous implementation? ITS HARD right? I mean, now you don’t keep track of which color contributed to which depth – you just keep track of the depths in the segment tree. Maybe you can keep track of the nodes and their colors separately as well… but can you? Won’t that ruin the complexity, since just to maintain that information by small-to-large, we need O(nlog^2n) time?
How to handle repeated occurrence
If we can get when is the first ancestor when a node will need to get “removed” (rather which is the first ancestor where is it will get replaced by another occurrence of the same color, which was in a different subtree, with lesser depth). Formally, for every node u of color c we need to find first ancestor x, such that depth[x] is greatest possible, and there exists a node v of color c such that depth[u]>depth[v] and LCA(u,v) = x. Then, after we merge the segment trees at node x, then for all such node $u$s, we update the segment tree as Arr[depth[u]] - = 1. This will help keep everything correct. (Think why this is equivalent to our previous merging one by one. Think about how we are doing the same additions and subtractions just in a roundabout way). Since there are only O(n) such deletions required, we incur a total time of O(nlogn) for the mergings.
How to find these ancestors?
Hint 1:
Process each color one by one.
Hint 2:
Use depths and euler tour.
Hint 3:
Only LCA of the nodes matter.
Hint 4:
LCA of adjacent nodes are all that matter.
Details
For every color, sort according to increasing depth. Now let’s say you are processing the x^{th} element, and you have already inserted everything with lesser depth into a Data Structure. Now, let’s say that inside this data structure, everything is sorted by DFS startTime of the nodes. Then, we just need to check for LCA with the previous and next nodes for x. You can just use a set for the data structure as sets provide iterators to move to next and previous elements easily.
Merging two Segment Trees
Now, lets backtrack a bit and focus on the merging of segment trees part, and how to do it. Well, what we essentially do is traverse both the segment trees simultaneously. Now, we do this as long as there are corresponding nodes in both the trees. Let’s say the trees be T_1 and T_2, and we are currently at node X. Let’s say the left child of X was null in T_1, and some node Y in T_2. Then after merging, we just attach Y as the left child of X in the updated version, instead of going further down. However, if both left children are not null, then we recursively merge them, and add them as the left child. Note that, this ensures that we will visit at most the nodes of the smaller segment tree, and probably less.
But what is the complexity?
First, notice that every time we visit a node during merging, we create a new node due to persistence. Hence, if we can put an upper bound on the number of nodes created, we can get a complexity. Again, remember that when we call our merge function with trees T_1 and T_2, we only traverse those nodes which are present in both trees. For a moment, let’s forget about persistence. Then, you basically traverse two nodes and “merge” them into one. Effectively what happens is, if T_2 was the segment tree of a light child, you won’t visit the nodes of T_2 again, as only the tree of the heavy child is pushed upwards. Hence, since for every two corresponding nodes that you visit in T_1 and T_2, you “discard” one node. Also note that unless both the corresponding nodes exist, you don’t visit either – you just attach the child pointers to the merged node. How many total nodes will be created and visited in this scenario? (Note that we aren’t considering persistence here). It’s O(NlogN) since we are essentially inserting N numbers overall and in dynamic segment tree, that corresponds to O(NlogN) memory. Hence, we can essentially discard only O(NlogN) nodes, which means that the total number of visited node can be O(2NlogN) which is O(NlogN). Now, let’s get back to persistence. Won’t that be creating many new nodes? No. Note that persistence only creates a new node when we visit a node. We just saw how we only ever visit O(NlogN) nodes. Hence, we will only be creating O(NlogN) extra nodes due to persistence. Hence, overall our complexity becomes O(NlogN).
[NOTE: : I have first proved the bound without using persistence, and then elaborated on what changes when we use persistence.]
Potential Functions
Note that I have explained in an informal language, what could have been written concisely using Potential Functions. Consider the potential function P = sum of sizes of the Segment Trees. We will not be considering persistence. When we do an insert operation, we add logn to P. Hence, max value of P can be NlogN. Whenever we visit two corresponding nodes in T_1 and T_2, we subtract 1 from P. Hence, we only ever visit 2NlogN nodes, worst case. Since in persistence we build additional nodes for every visited node, we have O(NlogN) extra nodes. Hence, our complexity is bounded by O(NlogN).
For more discussion on merging, and related problems, click here.
Final Complexity
So what is the final complexity? The merging of segment trees take O(NlogN) time, and we need to do an additional O(N) deletions, so even that takes O(NlogN) time. Hence, the total complexity is O(NlogN).
SOLUTIONS:
Setter's Code
#include<bits/stdc++.h>
using namespace std;
const int N = 4e5 + 11;
vector<int> v[N],q[N];
vector<pair<int,int> > s[N];
int pr[N],pr2[N];
int color[N],nx[N];
int tin[N],tout[N],deep[N],time_;
int d[N][18];
bool u[N];
int root[N];
int a[N*40],l1[N*40],r1[N*40];
int kk;
void dfs(int l)
{
time_++;
tin[l]=time_;
d[l][0]=pr[l];
for (int j=1; j<=17; j++)
d[l][j]=d[d[l][j-1]][j-1];
deep[l]=deep[pr[l]]+1;
for (int j=0; j<v[l].size(); j++)
dfs(v[l][j]);
tout[l]=time_;
}
bool cmp(int x, int y)
{
return (tin[x]<tin[y]);
}
bool pred(int x, int y)
{
return (tin[x]<=tin[y]&&tout[y]<=tout[x]);
}
int lca(int x, int y)
{
if (pred(x,y)) return x;
if (pred(y,x)) return y;
for (int j=17; j>=0; j--)
if (!pred(d[x][j],y)) x=d[x][j];
return d[x][0];
}
void build(int i, int l, int r)
{
if (l==r) {a[i]=0; return;}
int mid=(l+r)>>1;
kk++; l1[i]=kk; build(l1[i],l,mid);
kk++; r1[i]=kk; build(r1[i],mid+1,r);
a[i]=0;
}
void update(int i1, int i2, int l, int r, int x, int t)
{
while (1)
{
int mid=(l+r)>>1;
a[i2]=a[i1]+t;
if (l==r) break;
if (x<=mid)
{
kk++; l1[i2]=kk;
r1[i2]=r1[i1];
r=mid;
i1=l1[i1];
i2=l1[i2];
} else
{
l1[i2]=l1[i1];
kk++; r1[i2]=kk;
i1=r1[i1];
i2=r1[i2];
l=mid+1;
}
}
}
int get(int i, int l, int r, int tl, int tr)
{
if (tl<=l&&r<=tr) return a[i];
if (tl>r||l>tr) return 0;
int mid=(l+r)>>1;
return get(l1[i],l,mid,tl,tr)+get(r1[i],mid+1,r,tl,tr);
}
void up()
{
int n;
cin>>n;
kk=0;
time_=0;
for (int i=0; i<=n; i++)
{
u[i]=0;
root[i]=0;
v[i].clear();
tin[i]=0;
tout[i]=0;
q[i].clear();
nx[i]=0;
s[i].clear();
pr[i]=0;
pr2[i]=0;
}
for (int i=2; i<=n; i++)
{
cin>>pr[i];
v[pr[i]].push_back(i);
}
dfs(1);
tout[0]=n;
for (int i=1; i<=n; i++)
{
cin>>color[i];
q[color[i]].push_back(i);
}
for (int col=1; col<=n; col++)
if (q[col].size())
{
sort(q[col].begin(),q[col].end(),cmp);
vector<int> c;
c.push_back(q[col][0]);
u[q[col][0]]=1;
c.push_back(0);
for (int j=1; j<q[col].size(); j++)
{
int x=q[col][j];
int y=q[col][j-1];
u[x]=1;
int z=lca(x,y);
c.push_back(x);
c.push_back(z);
}
nx[0]=0;
sort(c.begin(),c.end(),cmp);
for (int j=1; j<c.size(); j++)
if (c[j]!=c[j-1])
{
int x=c[j];
int y=c[j-1];
while (!pred(y,x))
y=pr2[y];
pr2[x]=y;
nx[x]=0;
}
for (int j=c.size()-1; j>=1; j--)
if (j==(int)c.size()-1||c[j]!=c[j+1])
{
int x=c[j];
if (u[x]) nx[x]=x;
if (nx[pr2[x]]==0||deep[nx[pr2[x]]]>deep[nx[x]])
nx[pr2[x]]=nx[x];
s[deep[nx[x]]].push_back(make_pair(pr2[x],x));
}
for (int j=0; j<q[col].size(); j++)
u[q[col][j]]=0;
}
root[0]=1;
kk=1;
build(1,1,n);
for (int i=1; i<=n; i++)
{
int last=root[i-1];
for (int j=0; j<s[i].size(); j++)
{
int l=s[i][j].first,r=s[i][j].second;
kk++;
int now=kk;
update(last,now,1,n,tin[r],1);
last=now;
if (l!=0)
{
kk++;
now=kk;
if (l!=0) update(last,now,1,n,tin[l],-1);
last=now;
}
}
root[i]=last;
}
int m;
cin>>m;
int ans=0;
for (int i=1; i<=m; i++)
{
int x,d;
cin>>x>>d;
x=(x^ans);
d=(d^ans);
ans=get(root[min(n,deep[x]+d)],1,n,tin[x],tout[x]);
cout<<ans<<"\n";
}
}
int main()
{
ios_base::sync_with_stdio(0); cin.tie(0); cout.tie(0);
int t;
cin>>t;
while (t--)
{
up();
}
}
Tester's Code
#include <bits/stdc++.h>
using namespace std;
struct forest {
vector<pair<int,int>> edges;
vector<vector<int>> to, lg_parents;
vector<int> sub, color, parent, depth, pre, ord, in, out;
int comps, n, lgn, C;
forest(int n): n(n) {
to.resize(n);
sub.assign(n, 0);
color.assign(n, 0);
parent.resize(n);
depth.assign(n, 0);
in.resize(n);
out.resize(n);
}
void add_edge(int u, int v){
int id = edges.size();
assert(id < n - 1);
edges.push_back(make_pair(u, v));
to[u].push_back(id);
to[v].push_back(id);
}
inline int adj(int u, int id){ return u ^ edges[id].first ^ edges[id].second; }
void dfs(int u, int p){
pre.push_back(u);
in[u] = C++;
color[u] = comps;
parent[u] = p;
sub[u] = 1;
for(int id : to[u]){
int v = adj(u, id);
if(v == p) continue;
depth[v] = depth[u] + 1;
dfs(v, u);
sub[u] += sub[v];
}
out[u] = C++;
ord.push_back(u);
}
bool is_ancestor(int u, int v){
return in[u] <= in[v] && out[v] <= out[u];
}
void dfs_all(){
comps = 0;
C = 0;
for(int i = 0; i < n; i++){
if(!color[i]){
++comps;
dfs(i, -1);
}
}
}
void build_parents(){
lgn = 0;
while((1<<lgn) <= n) lgn++;
lg_parents.assign(lgn, vector<int>(n, -1));
for(int i = 0; i < n; i++)
lg_parents[0][i] = parent[i];
for(int i = 1; i < lgn; i++){
for(int j = 0; j < n; j++){
if(~lg_parents[i - 1][j]){
lg_parents[i][j] = lg_parents[i - 1][lg_parents[i - 1][j]];
}
}
}
}
int jump(int u, int k){
for(int i = lgn - 1; i >= 0; i--) if(k&(1<<i)) u = lg_parents[i][u];
return u;
}
int lca(int u, int v){
if(depth[u] < depth[v]) swap(u, v);
for(int i = lgn - 1; i >= 0; i--)
if((depth[u] - depth[v])&(1<<i))
u = lg_parents[i][u];
if(u == v)
return u;
for(int i = lgn - 1; i >= 0; i--)
if(lg_parents[i][u] != lg_parents[i][v]){
u = lg_parents[i][u];
v = lg_parents[i][v];
}
return lg_parents[0][u];
}
int dist(int u, int v){
return depth[u] + depth[v] - 2 * depth[lca(u, v)];
}
};
namespace hist {
const int node_pool = (400001) * 120, L = 0, R = 400001;
int sum[node_pool], _l[node_pool], _r[node_pool], at = node_pool;
int _get_node(int r){
--at;
assert(at > 0);
sum[at] = sum[r];
_l[at] = _l[r];
_r[at] = _r[r];
return at;
}
int add(int rt, int l, int r, int p, int v){
rt = _get_node(rt);
if(l == r) sum[rt] += v;
else {
int m = (l + r)>>1;
if(p <= m) _l[rt] = add(_l[rt], l, m, p, v);
else _r[rt] = add(_r[rt], m + 1, r, p, v);
sum[rt] = sum[_l[rt]] + sum[_r[rt]];
}
return rt;
}
int add(int rt, int p, int v){
return add(rt, L, R, p, v);
}
int get(int rt, int l, int r, int x, int y){
if(rt == 0) return 0;
if(l > y || r < x) return 0;
if(x <= l && r <= y) return sum[rt];
int m = (l + r)>>1;
return get(_l[rt], l, m, x, y) + get(_r[rt], m + 1, r, x, y);
}
int get(int rt, int x, int y){
return get(rt, L, R, x, y);
}
};
int main(){
int t;
scanf("%d", &t);
while(t--){
hist::at = hist::node_pool;
int n;
scanf("%d", &n);
vector<int> a(n);
forest fo(n);
for(int i = 1; i < n; i++){
int p; scanf("%d", &p); p--;
fo.add_edge(p, i);
}
fo.dfs_all();
fo.build_parents();
vector<vector<int>> group(n + 1);
for(int i = 0; i < n; i++) {
scanf("%d", &a[i]);
group[a[i]].push_back(i);
}
vector<int> parent(n);
vector<pair<int,int>> best(n);
vector<vector<pair<int,int>>> operations(n);
for(int i = 0; i <= n; i++){
if(group[i].size()){
sort(group[i].begin(), group[i].end(), [&](int u, int v){ return fo.in[u] < fo.in[v]; });
int len = group[i].size();
group[i].push_back(0);
for(int j = 0; j + 1 < len; j++)
group[i].push_back(fo.lca(group[i][j], group[i][j + 1]));
sort(group[i].begin(), group[i].end(), [&](int u, int v){ return fo.in[u] < fo.in[v]; });
group[i].erase(unique(group[i].begin(), group[i].end()), group[i].end());
for(int node : group[i]){
parent[node] = -1;
best[node] = {1<<30, 1<<30};
if(a[node] == i)
best[node] = {fo.depth[node], node};
}
len = group[i].size();
int last = 0;
for(int j = 1; j < len; j++){
int node = group[i][j];
while(last != 0 && !fo.is_ancestor(last, node)) last = parent[last];
parent[node] = last;
last = node;
}
len = group[i].size();
for(int j = len - 1; j >= 0; j--){
int u = group[i][j];
int p = parent[u];
operations[best[u].first].push_back({fo.in[u], 1});
if(p != -1){
best[p] = min(best[p], best[u]);
operations[best[u].first].push_back({fo.out[fo.jump(u, fo.depth[u] - fo.depth[p] - 1)], -1});
}
}
}
}
vector<int> pers(n);
for(int i = 0; i < n; i++){
if(i) pers[i] = pers[i - 1];
for(auto op : operations[i])
pers[i] = hist::add(pers[i], op.first, op.second);
}
int q;
scanf("%d", &q);
int ans = 0;
while(q--){
int X, D;
scanf("%d %d", &X, &D);
int x = X ^ ans;
int y = D ^ ans;
x--;
printf("%d\n", (ans = hist::get(pers[min(n - 1, fo.depth[x] + y)], fo.in[x], fo.out[x] - 1)));
}
}
return 0;
}
Please give me suggestions if anything is unclear so that I can improve. Thanks