Problem Link:Author: Mohammad Shahhoud & Said Sryhini Difficulty:MediumHard Prerequisites :Centroid Decomposition, Sweep Line, Binary Indexed Tree, Simple Math. Problem:Given a tree, each node $i$ has value $A_i$, find the number of node pairs($u$,$v$) such that the median of values in the path from $u$ to $v$ (inclusive) is at most $X$ and the mean is at least $Y$. Explanation:The current described solution based on the setter solution, so you can check the setter solution for the implementation. We will discuss many solutions that are going to lead us to the correct fast solution. Slow solution:Suppose we will loop on each pair of nodes ($u$) and ($v$), so now we have the end nodes fixed, we can do DFS from the starting node and get all the nodes in the path from ($u$) to ($v$) and put them into array, sort the array and get the mean and median, so the time complexity of this solution will be: $O(N^3.Log(N))$ which actually doesn't fit in the time limit. Better solution:We can actually do an optimization for the previous code by writing some math equations: Let $A_i$ be the value of node $i$, so if we create two arrays called $medianValue$ and $meanValue$ of length $N$ where: $medianValue[i] = (A[i] <= X) ? +1 : 1$ $meanValue[i] = A[i]  Y$ The $medianValue$ of node $i$ is +1 if the node value is less or equal to $X$ and 1 otherwise, and the $meanValue$ is $A[i]  Y$, but what's the point of creating those two arrays? To check the median condition of the array, we can simply sum the values of the node values on the path from ($u$) to ($v$) and let's call this value $sumMedian$, so if the $sumMedian$ value is greater or equal to zero, that means the number of values less than or equal to $X$ are greater than or equal to the number of values greater than $X$, so the median is a number less than or equal to $X$. To check the mean condition we can write these equations: $\sum_{i = 1}^{N}A[i] / N \ge Y$ ==> $\sum_{i = 1}^{N}A[i]\ge Y * N$ ==> $\sum_{i = 1}^{N}A[i]  Y * N \ge 0$ ==> $\sum_{i = 1}^{N}(A[i]Y) >= 0$ ==> $\sum_{i = 1}^{N}meanValue[i] >= 0$ So the problem conditions are correct if: $sumMedian \ge 0$ and $sumMean \ge 0$ Now after all that, we can do DFS from each node $i$ passing the sum of $meanValue$ and $medianValue$, checking of the $sumMedian$ and $sumMean$, we can get a solution of complexity: $O(N^2)$. Intended Solution:We will call a pair ($u$,$v$) is good if the conditions of median and mean hold. For better solution, we can do centroid decomposition for the tree. What is Centroid Decomposition?It's a technique used for dividing the tree into trees, each new tree has size at most $N/2$, and the node that divides the tree called centroid, calculating the number of goodpairs passing through this centroid, then call the same procedure on the new trees. you can read more about centroid decomposition from: Quora Thread . Now let's for simplicity consider a binary tree case, in the previous picture, the centroid divides the tree to two trees, but how can we get the number of goodpairs passing through the centroid? By doing DFS from the centroid, we can get for each node the value $sumMean$ and $sumMedian$, where $sumMean[i]$ is the sum of $meanValue[i]$ from the centroid to node $i$ (without centroid value), and $sumMedian[i]$ is the sum of $medianValue[i]$ from the centroid to node $i$ (without centroid value), now we have to find: $sumMedian[u] + sumMedian[v] + medianValue[centroid] \ge 0$ ==> $sumMedian[u] \ge sumMedian[v]  medianValue[centroid]$ and $sumMean[u] + sumMean[v] + meanValue[centroid] \ge 0$ ==> $sumMean[u] \ge sumMean[v]  meanValue[centroid]$ where ($u$) is a node from tree1 and ($v$) is a node from tree2 Let's consider ($sumMean[u]$, $sumMedian[u]$) as a point in 2D Cartisian Plane, so for each node ($v$) from tree2 we have to find the number of points from tree1 in the rectangle that its bottomleft corner is ($sumMean[v]meanValue[centroid]$, $sumMedian[v]medianValue[centroid]$) and its upperright corner is ($Infinity$, $Infinity$) and to solve this efficiently, we can do it using sweep line and Binary Indexed Tree. Sweep Line:First we have to convert the 2D problem to 1D problem for easier and more efficient solution, and we can do that by sorting all points by their $X$ values. Second, we note that each point has two events: 1. Add event: we add the point to the BIT.. The add event at point($sumMean[v]$) 2. Query event: we query to get the number of good points with the current point.. The query event at point($sumMean[v]  meanValue[centroid]$). So each point $v$ has two events, and each event has ($type, mean, median, index$) where $type$ is the type of the event, $mean$ is $sumMean[v]$ for add events and ($sumMean[v]  meanValue[centroid]$) for query events, $median$ is $sumMedian[v]$ for add events and ($sumMedian[v]  medianValue[centroid]$) for query events, index is the index of the owner point of the event. After that, we add all $2N$ events into array, sort them by their mean value and we add all the median values into the BIT. Loop from left to right on events, if the current event type is update, we remove the median value from the BIT, and if the current event type is query, we query on the BIT to find the good points with this point, the query interval is [$Event.median$, $Infinity$]. Important: Note that for query events, all $meanValue[u]$ that are less than the event mean value have removed, and all $meanValue[u]$ that are greater than or equal to the event mean value are still exist in the BIT, so the mean constraint is always true for each query event. There is a case when some point $u$ add point $v$ to the answer and $v$ add point $u$ to the answer (so here we count the same pair twise), so we can handle this case by removing the point from BIT when we query it. There is a problem in this solution; we said before that we want to find the number of paths passing through the centroid, but here we counted also the paths that don't, so we call the same above procedure for each resulting tree to remove those paths from the answer. The time complexity of this solution:The levels of centroid tree is at most $Log(N)$ levels (each time we get the half size of the tree). On each level we loop through all the nodes, calculate the $2N$ events and sort them with complexity $O(N.Log(N))$ Last step, we loop on all events, add $N$ points and remove $N$ points , query $N$ points, each (add, remove, query) operation is $Log(N)$, so the complexity is: $N.Log(N)$ The total complexity: $O(N.Log(N).Log(N))$
This question is marked "community wiki".
asked 21 Oct '17, 23:36
