KOL16L - Editorial

meooow · October 30, 2017, 7:29pm

PROBLEM LINK:

Practice
Contest

Editorialist: Soumik Sarkar

DIFFICULTY:

MEDIUM

PREREQUISITES:

Dynamic programming

PROBLEM:

Given the numbers [1..N] to be put in a binary search tree and a sequence of m searches (s_1, s_2,...,s_m), solve the optimal BST problem with the cost as the sum of the distances between every two consecutive searches s_i and s_{i+1} on the BST.

QUICK EXPLANATION:

Solve for the general case where cost is total distance of given of paths using dynamic programming similar to the original optimal BST problem.

EXPLANATION:

It helps to be familiar with the classical optimal BST problem, but it is not necessary. A detailed article on the classical optimal BST problem can be found here.

In the given problem the total cost is the sum of distances between each s_i and s_{i+1} on the tree. However the problem can be solved for the general case of m paths, each from some vertex x to another vertex y. Note that the first path, which is from the root to s_1 has no fixed starting vertex, but we can handle this by assuming the starting vertex is N+1 for reasons which will be clear after understanding the approach.

Consider a vertex u and the edge connecting it to its parent e_u. The contribution of e_u to the cost is the number of times any path passes through it. And the number of paths that pass through it are simply the number of paths that enter or exit the subtree rooted at u. Since this is a binary search tree, the subtree at u will contain values in a particular continuous range. Let this range be [l..r]. Then the number of paths that go out of u's subtree is the number of paths from x to y such that l \le x \le r and 1 \le y < l or r < y \le N. Which is also the total number of paths that start in [l..r] minus the number of paths that start in [l..r] and end in [l..r].

Let c(l_1, r_1, l_2, r_2) denote the number of paths that start in [l_1..r_1] and end in [l_2..r_2]. Then the number of paths going out of u's subtree are c(l, r, 1, N) - c(l, r, l, r). Similarly, the number of paths entering are c(1, N, l, r) - c(l, r, l, r).
So finally, the total contribution of the edge e_u to the cost is c(l, r, 1, N) + c(1, N, l, r) - 2c(l, r, l, r). As you can see this actually depends not on u, but on the range in u's subtree. Let us call this cost(l,r).

Let us define the cost of building up the range of numbers [l..r] into a subtree as f(l,r). If we choose u as the root of this subtree the cost will be f(l, u-1) + cost(l,r) + f(u+1, r). This is what we need to minimize using dynamic programming.

\begin{aligned} f(l,r) &= \min_{l \le u \le r} \{f(l, u-1) + cost(l,r) + f(u+1, r)\} \\ &= cost(l,r) + \min_{l \le u \le r} \{f(l, u-1) + f(u+1, r) \} \end{aligned}

To compute c(l_1, r_1, l_2, r_2) efficiently, we can make a table where cell (x, y) contains the number of paths from x to y. Then c(l_1, r_1, l_2, r_2) is just a subrectangle sum query which can be done using a 2D prefix sum table. So cost(l, r) can be computed in \mathcal{O}(1).
Total complexity is \mathcal{O}(N^3).

EDITORIALIST’S SOLUTION:

Editorialist’s solution can be found here.