- Leetcode 973
- Difficulty: Medium
- Patterns:
Problem Definition
Given an array of $points$ in the format $points[i]=(x_i,y_i)$, return the $k$ points closest to the origin $(0,0)$ by Euclidean distance $\sqrt{(x_i - x_j)^2 + (y_i - y_j)^2}$.
Examples
Some test cases to verify our solution.
assert(k_closest([(1, 3), (-2, 2)], 1) == [(-2,2)])
assert(set(k_closest([(3, 3), (5, -1), (-2, 4), (5, 15), (8, 19), (5.3, 13.5)], 2)) == set([(3, 3), (-2, 4)]))
assert(set(k_closest([(3, 3), (5, -1), (-2, 4)], 2)) == set([(3, 3), (-2, 4)]))
Heap Solution
We'll use a heap to keep track of the $k$ points closest to the origin as we traverse through the list or points.
Python's heapq.heapify
by default creates a min heap, so we negate the distances to have the default
min heap act as a max heap.
So the coordinates with the greatest distance from the origin will be at front of the heap.
A similar effect can be achieved by using a max heap.
import heapq, math
def get_neg_dist(x: float, y: float) -> float:
return -math.sqrt(x**2 + y**2)
def k_closest(points: list[tuple[float, float]], k: int) -> list[tuple[float, float]]:
n = len(points)
if n <= k:
return points
ans = [(get_neg_dist(x, y), (x, y)) for x, y in points[:k]]
heapq.heapify(ans)
for i in range(k, n):
x, y = points[i]
d = get_neg_dist(x, y)
if d > ans[0][0]:
heapq.heappop(ans)
heapq.heappush(ans, (d, (x, y)))
return [coord for _, coord in ans]
Heap Solution Complexity
Let $n = $ len(points)
.
Construction of the min-heap will take $O(k \cdot log(k))$ time. In the worst case, we might do $n-k$ insertions into the heap. So overall runtime complexity should be $O(k \cdot log(k) + n \cdot log(k) - k \cdot log(k)) = O(n \cdot log(k))$.
Auxiliary space complexity should be $O(k)$ since we're storing $k$ items.
Quick Select Solution
We can iterate over the list of points and create a tuple of form (distance from origin, (x, y))
.
Then we can do a quick select on the distance until we have found
a pivot with the index $k$, which indicates that everything before the item at that index is smaller than it
and everything after is greater.
Then we simply return the first $k$ items' coordinates.
Because we're using quick select, the average runtime will be better than the heap solution but the worst case will be worse.
import math
def get_dist(x: float, y: float) -> float:
return math.sqrt(x**2 + y**2)
def k_closest_quick_select(points: list[tuple[float, float]], k: int) -> list[tuple[float, float]]:
n = len(points)
if n <= k:
return points
arr = [(get_dist(*coord), coord) for coord in points]
def partition(start: int, end: int) -> int:
pivot_idx = random.randint(start, end)
pivot_dist = arr[pivot_idx][0]
arr[end], arr[pivot_idx] = arr[pivot_idx], arr[end]
l, r = start, end
while l < r:
while l < r and arr[l][0] < pivot_dist:
l += 1
while l < r and arr[r][0] >= pivot_dist:
r -= 1
arr[l], arr[r] = arr[r], arr[l]
arr[l], arr[end] = arr[end], arr[l]
return l
def quick_select(start: int, end: int) -> int:
if end - start == 1:
if arr[start][0] > arr[end][0]:
arr[start], arr[end] = arr[end], arr[start]
elif start < end:
p = partition(start, end)
if p < k:
quick_select(p + 1, end)
elif p > k:
quick_select(start, p)
quick_select(0, n - 1)
return [coord for _, coord in arr[:k]]
Quick Select Solution Complexity
Let $n = $ len(points)
.
Runtime complexity should be $O(n^2)$, since this is the worst case runtime for quicksort. E.g., we might pick every other pivot before we pick $k$.
Auxiliary space complexity should be $O(n)$ since we're creating an array of length $n$.