TREEDIST - Editorial


Contest - Division 3

Contest - Division 2

Contest - Division 1




Tree DP


Given a rooted tree of N nodes. Determine the sum of distance between nodes x and y over all tuples (u,v,x,y) such that

  • u < v,
  • u is the ancestor of x, v is the ancestor of y,
  • neither u or v are ancestors of each other.


The problem can be equivalently restated as:

Given a rooted tree of N nodes. Define cost(x,y) as:


where p is the lowest common ancestor of x,y.
Determine the sum of cost(x,y) over all ordered pairs (x,y).

Start by expanding the above equation to:

(dist(x,p)+dist(p,y))*dist(x,p)*dist(y,p) \\ = dist(x,p)^2*dist(y,p)+dist(y,p)^2*dist(x,p)

Since the above equation is dependent on p in each term, we can try finding the answer for each p, rather than each ordered pair (x,y).

Let X_p be the set of direct children of node p. Then, let C_{p,i} be the set of nodes in the subtree of node X_{p,i} (including node X_{p,i}) for all valid i.

Slight mathematical definition

\sum dist(S,p) represents the sum of distances between p and every node in set S.
\sum dist(S,p)^2 represents the sum of the square of distances between p and every node in set S.

The above equation, applied to node p is:

\sum_{i<j} \Big(\sum dist(C_{p,i},p)^2*\sum dist(C_{p,j},p)+\sum dist(C_{p,j},p)^2*\sum dist(C_{p,i},p)\Big)

Bingo! We now have a formula for the answer that doesn’t depend on the LCA function. How is that very helpful? The simplification gives the problem an elegant tree DP solution.

We jump to implementation. Let’s calculate the answer, by calculating the value of the above expression for each p and adding them together.

Define l_{X_{p,i}} and l2_{X_{p,i}} as \sum dist(C_{p,i},p) and \sum dist(C_{p,i},p)^2 respectively. Both arrays can be computed in O(N) using tree DP.


I leave the proof’s to the reader, as they are trivial to deduce.

Define sz_u as the number of nodes in the subtree of node x (including itself).
The recurrence relations are then:

l_u= \sum (l_{X_u} + sz_{X_u})


l2_u = \sum (l2_{X_u}+ 2*l_{X_u} + sz_{X_u})

which can be computed recursively, using traditional tree DP.

(Start with deriving the first equation, which is done trivially. To then derive the second equation, use the same logic and apply the formula (a+1)^2 = a^2+2*a+1)

The above equation then becomes:

\sum_{i<j} (l2_{X_{p,i}} * l_{X_{p,j}}+l2_{X_{p,j}} * l_{X_{p,i}})

which can be calculated in linear time using a modified form of sliding window.


Create two temp variables t and t2, both initially 0.

Iterate over X_p; let the current node be X_{p,i}. At each step, add t2*l_{X_{p,i}} + t*l2_{X_{p,i}} to the answer. Then, add l_{X_{p,i}} and l2_{X_{p,i}} to t and t2 respectively.

It can be shown easily, that this method correctly calculates the above summation in linear time.

Summing the last equation for all p gives us the desired answer !


Computing array’s l and l2 takes O(N) time. Computing the final equation above, for each p, takes O(|X_p|). The total complexity is thus:

O(N+N+|X_1|+|X_2|+\dots) = O(N+N+N) \approx O(N)


Editorialist’s solution can be found here

Experimental: For evaluation purposes, please rate the editorial (1 being poor and 5 excellent)

  • 1
  • 2
  • 3
  • 4
  • 5

0 voters