Given a list of 2d points, if any two points have distance(straight line) <= k , group them together. For example. [P1,P2,P3], P1 to P2 <=k, P2 to p3<=k, p1 to p3>k. they are still in the same group. (distance relationship is chainable ) ask how many groups can you find ? I can think of N^2 time complexity with union and find. but how to do better than that? maybe NlogN or O(N)?

# Google phone interview

[Commenting so that the problem gets visiblity.]

Can someone please share if they have a better approach?

+1

I want to see the O(nlogn) approach too.

I guess we can use disjoint union logic here … as i’ve read somewhere it is O(N).

Can you please show the link where you see it? Thanks

What about \{(2,2),(4,4)\}, k=3, your algorithm would see these as different because after dividing we have \{(0,0),(1,1)\} but \sqrt{(4-2)^2+(4-2)^2}=\sqrt{8}<\sqrt{9}=3=k

Yes , you are right.

I have an idea in mind. Please correct me if it is wrong.

First sort points by x axis, then do linear scan for consecutive pair and do union find for distance <=k,

Repeat the step for y axis. That will group all of them in nlogn.

You might however be able to get an average-case \mathcal{O}(n\lg n) algorithm by using a quad-tree quad-tree to find nodes that are close to a given node. This information can then be used to speed up the Union Find algorithm. But all such algorithms I could think of, have an \mathcal{O}(n^2) worst-case: imagine two parallel lines of points with a distance in between slightly more than k.

Thanks for sharing. How about the approach I mentioned?

\{(0,0),(1,8),(2,2),(8,1)\}, k=3 has 3 groups, your algorithm would suggest 4.

The problem here is that the (1,8) and (8,1) always come in between if you order on either the x-axis or y-axis

In your approach if the points are (0, 0), (1, 100), (100, 1), and (2, 2) and k = say 5, and if you started from (0, 0), you’ll miss (2, 2) even though it should be in the same group

Yes you are right

There are various ways to solve this problem divide and conquer(maybe), voronoi diagram or delaunay triangulation( for sure but very hard to code in interview), quad tree data structure,

i have one 2d 2 pointer approach which works in linear time i will post a code when i am done with stress testing it.

**Here goes my algorithm:**

Assumption : We only have integral coordinates points from integers:

Step 1: Pick Random Point (A,B) from array

Step 2: Calculate the distance of remaining elements in form of Polar coordinates (Theta, Distance )

Step 3: Divide the elements of array by distance within k from (A,B) , between k+1, 2k , between 2k+1, 3k …… tk+1, (t+1)k…… Lets call tk+1,(t+1)k a annulus

Step 4: Store the element of particular annulus sorted by polar angle.

Step 5: Finding all points at distance k from A,B is trivial .

Observation

There can be max of O(t) disjoint group at annulus between radius tk+1,(t+1)k

There can be max of O(k^2 * t) elements in particular annulus .

Step 6 : Now in particular annulus t , pick a random point , form circle with diameter k around that contained in annulus , find all points within this circle (all points in this sector can be efficiently found by step 4 ). There can be max of (k^2) elements found . (Represented by start and end index)

Repeat step 5 for particular annulus t , total of O(t) repetitions (k^2 * t )/ (k^2);

Step 7 : At each annulus t there can be max t groups , for each of this t group , Now problem is reduced to finding union of groups between two annulus t , t+1

Step 8: Start from A,B , use step 5 to get all points at distance k from A,B , there are max of O(k^2* 1) point in single group in annulus 1 , so max O(K^2 * t) points in annulus t in single group needs to be compared to Max O( k^2 *(t+1 )) points in single group in annulus t+1.

Result would be all groups union between all annulus found effectively by this step.

Now problem is finding upper bound on number of elements that can be compared between two consecutive annulus , Suppose there are N/ 2 in innermost circle , N/2 in single group in annulus just outermost to it , so for each N/2 element (form circle around it ) in outer annulus find if point lies in area of intersection between these two circle . Finding elements that lie inside circle can be done by choosing k^2 / 2 points out of K^2 points in inner circle that lie inside sector and consecutively testing them . There will be two repetition of K^2 / 2 each at max . So complexity of this = O(k^2 ) , where K^2 = N.

Worst case complexity will occur when :

If all annulus are having each single point in each group (max t groups in annulus t , in observation ) max number of annulus are SQRT(N) (Proof : 1+2+3+4+5……t=N)

This involves constant number of t comparison between two annulus in this case so worst complexity is 1+2+3+4…SQRT(N) = N.

There are two extreme cases of this algo explained one : All groups filled . Second all groups contain single element in each annulus .

So. Time Complexity of this solution is O(N).

Correct me if I am wrong at any step .

How about using k-d tree ?