TDRIVER - Editorial

mogers · April 19, 2015, 10:42pm

PROBLEM LINK:

Practice
Contest

Author: Konstantsin Sokol
Tester: Gedi Zheng
Editorialist: Miguel Oliveira

DIFFICULTY:

Medium.

PREREQUISITES:

Geometry, Manhattan distance.

PROBLEM

Given a set of N points and constants a and b, calculate the sum of the distances between each pair of points. However, the distance between 2 points is defined as max(a × |x1 - x2|, b × |y1 - y2|).

QUICK EXPLANATION

Transform each coordinate (x, y) into (\frac{a * x + b * y}{2}, \frac{a * x - b * y}{2}).
Calculate the distance between all pairs of points using the Manhattan distance on the new coordinates.

EXPLANATION

Problem MOSTDIST, required a transformation from Manhattan distance metric to maximum metric (editorial here). Here, we do the reverse. Suppose a = b = 1. Then the distance between points A and P is max(|A_x - P_x| , |A_y - P_y|).

max(|A_x - P_x| , |A_y - P_y|) = max(A_x - P_x , P_x - A_x , A_y - P_y , P_y - A_y)

Let’s perform the following variable transformation

\begin{cases} A_x = S_x + S_y \\ A_y = S_x - S_y \\ P_x = T_x + T_y \\ P_y = T_x - T_y \end{cases}

We get

max( Ax - Px , Px - Ax , Ay - Py , Py - Ay ) =

max( (Sx+Sy) - (Tx+Ty), (Tx+Ty) - (Sx+Sy), (Sx-Sy) - (Tx-Ty), (Tx-Ty) - (Sx-Sy) )

Rearrange this into

max( (Sx-Tx) + (Sy-Ty), (Tx-Sx) + (Ty-Sy), (Sx-Tx) + (Sy-Ty), (Tx-Sx) + (Ty-Sy) ) =

max( Sx - Tx , Tx - Sx ) + max( Sy - Ty , Ty - Sy ) =

|Sx-Tx| + |Sy-Ty|

So we were able to get rid of the maximum function and only have a manhattan distance on these 4 variables. Inverting the transformation, we have

\begin{cases} S_x = \frac{A_x + A_y}{2} \\ S_y = \frac{A_x - A_y}{2} \\ T_x = \frac{P_x + P_y}{2} \\ T_y = \frac{P_x - P_y}{2} \end{cases}

Hence, we can modify the given point coordinates using this transformation and then calculate the manhattan distance between all the points using the new coordinate system.

In order to take the parameters a and b into account, we can just multiply each variable in x by a and each variable in y by b.

Note that we can avoid using floating point numbers in the coordinates due to division by 2 by using the transformations new_x = x + y, new_y = x - y and divide the result by 2 in the end.

Time Complexity

We sort the coordinates and then process them in linear time. Hence, the total time complexity is O(N log N).

AUTHOR’S AND TESTER’S SOLUTIONS:

Setter’s solution
Tester’s solution

rohanagarwal · April 20, 2015, 11:48am

Nice Solution…!!

raghu_317 · April 20, 2015, 4:32pm

can anyone explain the logic behind the question.

annaqeeb · April 21, 2015, 2:28am

This answer is based on the above tutorial and the [Setter Solution] 2
The names of distances are based on user Khaled Kee and Wikipedia.

For any Type of distances, the summantion of inner distance D for N nodes can be calculated by any one of the following formulas:

D_N =\frac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} d_{i,j}

= \sum_{i=2}^N \sum_{j=1}^{i-1} d_{i,j}

=\sum_{i=1}^N \sum_{j=1}^{i-1} d_{i,j}

These formulas are simple, d_i,j is distance between Point i and j. If you calculate for all possible values of i and j then you calculated the distance twice. If you choose that
j smaller than i, then it is calculated once. Direct application of any of these formulas is O(n^2).
The last formula is interesting as it can be turned to a recursive formula:

D_i = D_{i-1}+ \sum_{j=1}^{i-1} d_{i,j}

We will get back to this formula later in this answer so keep it in mind.

Before Going further while the previous rule apply to any type of distance, what was meant with type of distance?

Here are 3 known types of distances:

-Euclidean distance : this is the distance between two points, if you have infinite number of directions to move in as in Geometry in mid and High school

d_{i,j}= \sqrt {{x_i -x_j}^2 +{y_i-y_j}^2}

-Manhattan distance : this is the distance between two points, if you have 4 directions to move, for example: rooks and bishops in chessboard. Also called City block distance or Taxicab distance.

d_{i,j}= |{x_i-x_j}| +|{y_i-y_j}|

-Chebyshev distance : this is the distance between two points, if you have 8 directions to move, for example: king in chessboard. Also called Maximum metric or Chessboard distance.

d_{i,j}= max(|{x_i-x_j}| ,|{y_i-y_j}|)

The problem is about calculating total inner Chebyshev distance. Note that Chebychev distance in this problem is scaled by factors (a,b) for the (x,y) coordinates.

Now we need to reduce the time Complexity from O(n^2) to O(n log(n)), how?
First we will transform the coordinates given in the problem , so that the inner Chebyshev distance in the given coordinated (x_C, y_C) is equal to the inner Manhattan distance int the transformed coordinates (x_M,y_M). The transformation is:

(x_M,y_M)= (\frac{a \times x_C+ b \times y_C}{2}, \frac{a \times x_C+ b \times y_C}{2})

Why is Manhattan distance for points P_M = (x_M,y_M)= (\frac{a \times x_C+ b \times y_C}{2}, \frac{a \times x_C+ b \times y_C}{2}) is equal to Chebyshev distance for points P_C =(x_C,y_C) this is already mentioned at Problem MOSTDIST - Editorial. I added another answer to the current question that reviews the proof.

Now back to our main problem, how could this improve complexity?

Summation of Manhattan distance have a very nice properties. First they are separable for each coordinate.
D_i = D_{i-1}+ \sum_{j=1}^{i-1} d_{i,j}

= D_{i-1}+ \sum_{j=1}^{i-1}(|{x_i-x_j}| +|{y_i-y_j}|)

=D_{i-1}^x+ \sum_{j=1}^{i-1} |x_i-x_j| + D_{i-1}^y+\sum_{j=1}^{i-1} |y_i-y_j|

=D_i^x +D_i^y

What if the values are sorted, i.e, if i is larger than j then x_i \geq x_j ,then we can safely remove the absolute value function,as the value is always zero or positive , so:

D_i^x= D_{i-1}^x+ \sum_{j=1}^{i-1}(x_i-x_j)= D_{i-1}^x +(i-1) \times x_i - \sum_{j=1}^{i-1}x_j

D_i^y= D_{i-1}^y+ \sum_{j=1}^{i-1}(y_i-y_j)= D_{i-1}^y +(i-1) \times y_i - \sum_{j=1}^{i-1}y_j

The first calculation requires sorting according to x, the second according to y. this is not a problem as they do not depend on each other at all. last note , do not divide by 2 while transforming as this will cause errors in the integer odd values. instead calculate the whole sum then divide it by 2, in the end. If you have any problems , see my submission . The last summation can be calculated easily from its previous value.

Now the complexity is max( O(sorting), O(summation) )=max( O(n log(n)), O(n))= O(n log(n))

annaqeeb · April 21, 2015, 2:31am

Let’s say the proof of the transformation again.

d_{1,2}^M = |x_1^M-x_2^M| +|y_1^M-y_2^M|=| \frac{a \times x_1^C+b \times y_1^C - a \times x_2^C -b \times y_2^C}{2}| +| \frac{a \times x_1^C -b \times y_1^C - a \times x_2^C +b \times y_2^C}{2}|

= | \frac{a \times (x_1^C -x_2^C) +b \times (y_1^C - y_2^C)}{2} +| \frac{a \times (x_1^C -x_2^C) -b \times (y_1^C - y_2^C)}{2}|

=|q+r|+|q-r|

where q=\frac{a \times (x_1^C -x_2^C)}{2} and r=\frac{b \times (y_1^C -y_2^C)}{2}

for any q and r we can prove

|q+r|+|q-r|= max(q+r, -q-r) +max(q-r, -q+r)

= max(q+r+q-r, q+r-q+r, -q-r+q-r, -q-r-q+r)

= max(2 \times q , 2 \times r, -2 \times r, -2 \times q)

= max( 2 \times max(q, -q), 2 \times max(r,-r))=2 \times max(|q|, |r|)

So applying this in the above equation:

d_{1,2}^M= max(a \times |x_1^C -x_2^C|, b \times |y_1^C - y_2^C|)= d_{1,2}^C

Q.E.D.

himanshujaju · April 20, 2015, 1:19pm

why do we transform the co-ordinates exactly in that fashion ?

amanjain110893 · April 20, 2015, 7:10pm

I have gone through MOSTDIST editorial and I understood it.

But I can’t figure out how logic of MOSTDIST is applied here. Why you are converting coords from (x,y) to ((ax+by)/2,(ax-by)/2)

How sorting xx and yy coords and calculating the manhattan distance separately conserving equation of distance between two points : max(a*|x1-x2|,b*|y1-y2|) ?

mogers · April 20, 2015, 11:10pm

I added an explanation, please take a look.

mogers · April 20, 2015, 11:11pm

I elaborated the coordinate transformation, please take a look.