See all coding puzzles and patterns here.

Problem Definition

Given an array of $points$ in the format $points[i]=(x_i,y_i)$, return the $k$ points closest to the origin $(0,0)$ by Euclidean distance $\sqrt{(x_i - x_j)^2 + (y_i - y_j)^2}$.


Examples

Some test cases to verify our solution.

assert(k_closest([(1, 3), (-2, 2)], 1) == [(-2,2)])
assert(set(k_closest([(3, 3), (5, -1), (-2, 4), (5, 15), (8, 19), (5.3, 13.5)], 2)) == set([(3, 3), (-2, 4)]))
assert(set(k_closest([(3, 3), (5, -1), (-2, 4)], 2)) == set([(3, 3), (-2, 4)]))

Heap Solution

We'll use a heap to keep track of the $k$ points closest to the origin as we traverse through the list or points.

Python's heapq.heapify by default creates a min heap, so we negate the distances to have the default min heap act as a max heap. So the coordinates with the greatest distance from the origin will be at front of the heap. A similar effect can be achieved by using a max heap.

import heapq, math

def get_neg_dist(x: float, y: float) -> float:
    return -math.sqrt(x**2 + y**2)

def k_closest(points: list[tuple[float, float]], k: int) -> list[tuple[float, float]]:
    n = len(points)
    if n <= k:
        return points
    ans = [(get_neg_dist(x, y), (x, y)) for x, y in points[:k]]
    heapq.heapify(ans)
    for i in range(k, n):
        x, y = points[i]
        d = get_neg_dist(x, y)
        if d > ans[0][0]:
            heapq.heappop(ans)
            heapq.heappush(ans, (d, (x, y)))
    return [coord for _, coord in ans]

Heap Solution Complexity

Let $n = $ len(points).

Construction of the min-heap will take $O(k \cdot log(k))$ time. In the worst case, we might do $n-k$ insertions into the heap. So overall runtime complexity should be $O(k \cdot log(k) + n \cdot log(k) - k \cdot log(k)) = O(n \cdot log(k))$.

Auxiliary space complexity should be $O(k)$ since we're storing $k$ items.


Quick Select Solution

We can iterate over the list of points and create a tuple of form (distance from origin, (x, y)). Then we can do a quick select on the distance until we have found a pivot with the index $k$, which indicates that everything before the item at that index is smaller than it and everything after is greater. Then we simply return the first $k$ items' coordinates. Because we're using quick select, the average runtime will be better than the heap solution but the worst case will be worse.

import math

def get_dist(x: float, y: float) -> float:
    return math.sqrt(x**2 + y**2)

def k_closest_quick_select(points: list[tuple[float, float]], k: int) -> list[tuple[float, float]]:
    n = len(points)
    if n <= k:
        return points
    arr = [(get_dist(*coord), coord) for coord in points]
    def partition(start: int, end: int) -> int:
        pivot_idx = random.randint(start, end)
        pivot_dist = arr[pivot_idx][0]
        arr[end], arr[pivot_idx] = arr[pivot_idx], arr[end]
        l, r = start, end
        while l < r:
            while l < r and arr[l][0] < pivot_dist:
                l += 1
            while l < r and arr[r][0] >= pivot_dist:
                r -= 1
            arr[l], arr[r] = arr[r], arr[l]
        arr[l], arr[end] = arr[end], arr[l]
        return l
    def quick_select(start: int, end: int) -> int:
        if end - start == 1:
            if arr[start][0] > arr[end][0]:
                arr[start], arr[end] = arr[end], arr[start]
        elif start < end:
            p = partition(start, end)
            if p < k:
                quick_select(p + 1, end)
            elif p > k:
                quick_select(start, p)
    quick_select(0, n - 1)
    return [coord for _, coord in arr[:k]]

Quick Select Solution Complexity

Let $n = $ len(points).

Runtime complexity should be $O(n^2)$, since this is the worst case runtime for quicksort. E.g., we might pick every other pivot before we pick $k$.

Auxiliary space complexity should be $O(n)$ since we're creating an array of length $n$.