SUMDIS - Editorial

pkacprzak · March 13, 2017, 2:46am

PROBLEM LINK:

Practice
Contest

Author: Hanlin Ren
Testers: Sergey Kulik, Kamil Dębowski
Editorialist: Pawel Kacprzak

DIFFICULTY:

Hard

PREREQUISITES:

Divide and conquer, sweep line/segment tree

PROBLEM:

G is a directed acyclic graph with N vertices numbered from 1 to N. There are 3 classes of edges in G:

For every 1 \leq i \leq N-1 there is an edge (i, i+1) with weight a_i
For every 1 \leq i \leq N-2 there is an edge (i, i+2) with weight b_i
For every 1 \leq i \leq N-3 there is an edge (i, i+3) with weight c_i

For any pair of vertices s < t, let d(s, t) be the length of the shortest path from s to t in G. The task is to compute the sum of d(s, t) for every 1 \leq s \lt t \leq N.

QUICK EXPLANATION:

Use divide and conquer to remove 3 central nodes, which disconnects the graph, solve the problem recursively into smaller graphs and combine the results into the final result.

EXPLANATION:

First of all, notice that the graph G is a directed acyclic graph and has O(N) edges with a very small constant.

Subtask 1

In the first subtask, we have N \leq 10^3 and sum of N over all test cases not greater than 10^4. This allows the most straightforward method of iterating over all vertices, and for a fixed vertex i, compute the sum of distances from i to all other vertices j > i. This can be easily done in O(N) time for a fixed vertex i because the graph is a directed acyclic graph (see here for more details). In our case, it’s even easier because the topologic order of G is just 1, 2, \ldots, N.

Subtask 2

In the second subtask, we know that there is a specific relation between weight of edges in all classes:

b_i = a_i + a_{i+1}
c_i = a_i + a_{i+1} + a_{i+2}

If we take a closer look at above equations, we can notice that if P is the shortest path from i to j>i, and P uses at least one edge of the second or the third class, then there exists another path P' of the same length that P, such that P' use only edges from the first class. This is true because any edge of the second class in P can be replaced by two consecutive edges of the first class without changing the length of the path. Similarly, each edge of the third class in P can be replaced by three consecutive edges of the first class without changing the length of the path. Thus, we can reduce the graph to a path graph where vertices i and i+1 are connected by an edge with weight a_i and there are no other edges in the graph. Moreover, between any two vertices i < j there is now exactly one path.

How to solve this simpler problem on the path graph? Well, it’s pretty easy. An Edge (i,i+1) for i=1,2, \ldots, N-1 is taken into the final result i \cdot (N-i) times because it is a part of that many shortest paths connecting vertices with indices x \leq i with vertices with indices y \geq i+1.

Subtask 3

In the third subtask, we also know something specific about relation of weight of edges from the first and the third class:

c_i = a_i + a_{i+1} + a_{i+2}

Using a similar reasoning as in the solution for the second subtask, we can reduce the input graph to a graph with edges only from the first and the second class. The solution for such a graph is the same as the solution for the third subtask, except the fact that one crucial step is easier to perform. Please read the description of the intended solution below to get the overall idea, and the distinction between these two solutions it also pointed out there.

Subtask 4

In the last subtask, we have the original constraints. The idea here is to use divide and conquer approach in order. Let’s assume that we are solving the problem for an induced graph G with vertices from range [L, R] only. Clearly, the final result will be the solution for a graph with vertices in a range [1, N]. Now, the crucial observation comes:

Let m be some vertex such that L < m < R. Then, if we remove vertices m-1, m and m+1 from G, then we get two disconnected graphs G_L and G_R denoted by vertices in ranges respectively [L, m-2] and [m+2, R]. Thus, we can solve the problem recursively for both of them, add sub-results computed for these sub-problems, and the only remaining thing to do is to compute:

f(l, r, m) := \sum_{l \leq i < m < j \leq r} d(i, j)
g(l, r, m) := \sum\limits_{l \leq i < m-1} d(i,m-1) + \sum\limits_{l\leq i < m} d(i,m)+ \sum\limits_{m < i \leq r} d(m, i) + \sum\limits_{m+1< i \leq r} d(m+1,i)

where f(l, r, m) is the sum of all shortest paths from vertices with numbers smaller than m to vertices with numbers larger than m, while g(l, r, m) is the sum of all shortest paths to vertices m-1 and m and from vertices m and m+1.

The result for graph G denoted by [L, R] is then the sum of subresults for graphs G_1 and G_2 plus f(l,r, m) plus g(l,r,m).

One more thing before we take a look at how to compute values of f and g. Notice that as often in divide and conquer approaches, we want to divide the problem into as equal (in terms of size) sub-problems as we can. Here, picking m = (L+R)/2 does the job because it reduces the size of the problem by a factor of 2 every time. Moreover, when the size of a sub-problem is small enough, so R-L+1 < C, where C is some constant, we can solve the problem with a quadratic solution described as the solution to the first subtask. This is a common technique used to speed up a solution in practice.

Now, how to compute values of f and g defined above? Let’s start with g because it’s easier.

Notice, that g is just a sum of three different sums. Each of them is a sum of the shortest paths to or from a fixed vertex, so a single such sum can be computed easily in O(N) time by solving a standard shortest path problem in a DAG.

Now the harder part, computing f. Let’s consider two vertices i < m and j > m. Then, d(i, j) is the smallest of the following 3 values:

d_1(i, j) := d(i, m-1) + d(m-1, j)
d_2(i, j) := d(i, m) + d(m, j)
d_3(i, j) := d(i, m+1) + d(m+1, j)

Thus, we are interested, for all i < m and j > m, how many times d(i, m-1) contributes to the final result, so how many times path d(i, m-1) is a part a a shortest path in all shortest paths counted in f(l, r, m). Similarly, we want to count the same for d(m-1, j), d(i, m), d(m, j), d(i, m+1), and d(m+1, j). We are going to show how to do it only for d(i, m-1), because other cases are very similar.

Now, we are interested in how many times d(i, m-1) is counted into f(l,r,m), which is equal to the number of times d_1(i, j) is smaller than d_2(i,j) and also is smaller than d_3(i,j). If any two paths, for example d_1(i, j) and d_2(i, j) are equal, we say that the one with smaller subscript is smaller, so d_1(i,j) is smaller in this case. In other words, for a fixed node i, we are interested in number of j, such that m < j \leq r, and both below equations are fullfilled:

d(i, m-1) + d(m-1, j) \leq d(i, m) + d(m, j)
d(i, m-1) + d(m-1, j) \leq d(i, m+1) + d(m+1, j)

above equations can be rewriten as:

d(m-1, j) - d(m,j) \leq d(i, m) - d(i, m-1)
d(m-1, j) - d(m+1,j) \leq d(i, m+1) - d(i, m-1)

Now, if we let:

x_0 = d(i,m) - d(i,m-1)
y_0 = d(i,m+1) - d(i,m-1)

and

x_j = d(m-1,j) - d(m,j)
y_j = d(m-1,j) - d(m+1,j)

Then we are interested in the number of points (x_j, y_j) in 2d plane, such that x_j \leq x_0 and y_j \leq y_0. This last problem can be solved with for example a sweep line algorithm, sweeping through all O(R-L) points in sorted order by their x coordinates and maintaining a set of all points seen so far sorted by their y coordinate. It follows, that the time needed to compute f(l, r, m) is O((R-L) \cdot \log (R-L)), and its dominated by the time of sorting the points and performing query operations to the set sorted by y coordinates.

Now, we showed how to decompose a problem using divide and conquer and how to combine the results computed for subproblems with additional values, f and g, into the final result. The total time complexity of this approach can be written as T(N) = 2 \cdot T(N/2) + O(N \log N), which is equal to O(N \log^2 N).

For implementation details please refer to both setter’s and tester’s solutions.

AUTHOR’S AND TESTER’S SOLUTIONS:

Setter’s solution can be found here.
Second tester’s solution can be found here.

linkret · March 13, 2017, 6:07pm

I think this is a very good problem!

abx_2109 · March 13, 2017, 10:29pm

In the subtask 2 in the editorial above “An Edge (i,i+1)(i,i+1) for i=1,2,…,N−1i=1,2,…,N−1 is taken into the final result i⋅(N−1) times because it is a part of that many shortest paths connecting vertices with indices x≤ix≤i with vertices with indices y≥i+1y≥i+1.” It should be i(N-i) instead of i(N-1).

swamicoder · March 14, 2017, 10:31pm

Apologies.I have solved this problem for second subtask, still not been able to figure out where I am wrong.

I have printed sum of i*(N-i) for all vertices of a[i].

Here is Link:-CodeChef: Practical coding for everyone

abhikalpu_123 · May 25, 2017, 1:28am

nice editorial…

pkacprzak · March 13, 2017, 6:11pm

Yeah, it’s really awesome

pkacprzak · March 13, 2017, 10:48pm

Corrected, thanks a lot for pointing it out

pkacprzak · March 14, 2017, 10:54pm

Take a closer look when you do: “long long int p= a[i]i(n-i);” Notice that a[i], i, and n are declared as 32-bit integers in your code, so the result of their multiplication is also 32-bit integer and you have overflow

swamicoder · March 14, 2017, 11:14pm

Thanks. That was silly mistake. I am have tried atleast 4-5 times(but was not able to notice overflow). This is the best problem I have encountered this month.