### Problem Link:

**Author**: Mohammad Shahhoud & Said Sryhini

**Tester**: Hasan Jadouh

**Editorialist**: Mohammad Shahhoud

### Difficulty:

Medium-Hard

### Prerequisites :

Centroid Decomposition, Sweep Line, Binary Indexed Tree, Simple Math.

### Problem:

Given a tree, each node i has value A_i, find the number of node pairs(u,v) such that the **median** of values in the path from u to v (inclusive) is at most X and the **mean** is at least Y.

### Explanation:

The current described solution based on the setter solution, so you can check the setter solution for the implementation.

We will discuss many solutions that are going to lead us to the correct fast solution.

#### Slow solution:

Suppose we will loop on each pair of nodes (u) and (v), so now we have the end nodes fixed, we can do DFS from the starting node and get all the nodes in the path from (u) to (v) and put them into array, sort the array and get the mean and median, so the time complexity of this solution will be: O(N^3.Log(N)) which actually doesn’t fit in the time limit.

#### Better solution:

We can actually do an optimization for the previous code by writing some math equations:

Let A_i be the value of node i, so if we create two arrays called medianValue and meanValue of length N where:

medianValue[i] = (A[i] <= X) ? +1 : -1

meanValue[i] = A[i] - Y

The medianValue of node i is **+1** if the node value is less or equal to X and **-1** otherwise, and the meanValue is A[i] - Y, but what’s the point of creating those two arrays?

To check the median condition of the array, we can simply sum the values of the node values on the path from (u) to (v) and let’s call this value sumMedian, so if the sumMedian value is greater or equal to zero, that means the number of values less than or equal to X are greater than or equal to the number of values greater than X, so the median is a number less than or equal to X. To check the mean condition we can write these equations:

\sum_{i = 1}^{N}A[i] / N \ge Y ==> \sum_{i = 1}^{N}A[i]\ge Y * N ==> \sum_{i = 1}^{N}A[i] - Y * N \ge 0 ==> \sum_{i = 1}^{N}(A[i]-Y) >= 0 ==> \sum_{i = 1}^{N}meanValue[i] >= 0

So the problem conditions are correct if: sumMedian \ge 0 **and** sumMean \ge 0

Now after all that, we can do DFS from each node i passing the sum of meanValue and medianValue, checking of the sumMedian and sumMean, we can get a solution of complexity: O(N^2).

#### Intended Solution:

We will call a pair (u,v) is **good** if the conditions of median and mean hold.

For better solution, we can do **centroid decomposition** for the tree.

#### What is Centroid Decomposition?

It’s a technique used for dividing the tree into trees, each new tree has size at most N/2, and the node that divides the tree called centroid, calculating the number of good-pairs passing through this centroid, then call the same procedure on the new trees. you can read more about centroid decomposition from: Quora Thread .

![tree][8]Now let’s for simplicity consider a binary tree case, in the previous picture, the centroid divides the tree to two trees, but how can we get the number of good-pairs passing through the centroid?

By doing DFS from the centroid, we can get for each node the value sumMean and sumMedian, where sumMean[i] is the sum of meanValue[i] from the centroid to node i (**without centroid value**), and sumMedian[i] is the sum of medianValue[i] from the centroid to node i (**without centroid value**), now we have to find:

sumMedian[u] + sumMedian[v] + medianValue[centroid] \ge 0 ==> sumMedian[u] \ge -sumMedian[v] - medianValue[centroid]

**and**

sumMean[u] + sumMean[v] + meanValue[centroid] \ge 0 ==> sumMean[u] \ge -sumMean[v] - meanValue[centroid]

where (u) is a node from tree1 and (v) is a node from tree2

Let’s consider (sumMean[u], sumMedian[u]) as a point in **2D Cartisian Plane**, so for each node (v) from tree2 we have to find the number of points from tree1 in the rectangle that its bottom-left corner is (-sumMean[v]-meanValue[centroid], -sumMedian[v]-medianValue[centroid]) and its upper-right corner is (Infinity, Infinity) and to solve this efficiently, we can do it using sweep line and Binary Indexed Tree.

#### Sweep Line:

First we have to convert the 2D problem to 1D problem for easier and more efficient solution, and we can do that by sorting all points by their X values.

Second, we note that each point has two events:

- Add event: we add the point to the
**BIT**… The add event at point(sumMean[v]) - Query event: we query to get the number of good points with the current point… The query event at point(-sumMean[v] - meanValue[centroid]).

So each point v has two events, and each event has (type, mean, median, index) where type is the type of the event, mean is sumMean[v] for add events and (-sumMean[v] - meanValue[centroid]) for query events, median is sumMedian[v] for add events and (-sumMedian[v] - medianValue[centroid]) for query events, index is the index of the owner point of the event.

After that, we add all 2N events into array, sort them by their mean value and we add all the median values into the **BIT**.

Loop from left to right on events, if the current event type is update, we remove the median value from the *BIT*, and if the current event type is query, we query on the *BIT* to find the good points with this point, the query interval is [Event.median, Infinity].

**Important**: Note that for query events, all meanValue[u] that are less than the event mean value have removed, and all meanValue[u] that are greater than or equal to the event mean value are still exist in the *BIT*, so the mean constraint is always true for each query event.

There is a case when some point u add point v to the answer and v add point u to the answer (so here we count the same pair twise), so we can handle this case by removing the point from BIT when we query it.

There is a problem in this solution; we said before that we want to find the number of paths *passing through* the centroid, but here we counted also the paths that don’t, so we call the same above procedure for each resulting tree to remove those paths from the answer.

### The time complexity of this solution:

The levels of centroid tree is at most Log(N) levels (each time we get the **half size** of the tree).

On each level we loop through all the nodes, calculate the 2N events and sort them with complexity O(N.Log(N))

Last step, we loop on all events, add N points and remove N points , query N points, each (add, remove, query) operation is Log(N), so the complexity is: N.Log(N)

**The total complexity**: O(N.Log(N).Log(N))