See all coding puzzles and patterns here.

Problem Definition

Given 2 sparse vectors, which mostly have 0 values, compute their dot product. Do this by implementing the class SparseVector, which supports:

  • an initialization method that takes in an array of numbers representing the sparse vector, and
  • a dot_product method that takes another SparseVector and computes the dot product of the two vectors.

Examples

Some test cases to verify our solution.

v1 = SparseVector([1, 0, 0, 2, 3])
v2 = SparseVector([0, 3, 0, 4, 0])
assert v1.dot_product(v2) == 8
assert v2.dot_product(v1) == 8

l1 = [1] + [0] * 1_000_000 + [1]
l2 = [5, 4, 3, 2, 1, 0]
v1 = SparseVector(l1)
v2 = SparseVector(l2)
assert v1.dot_product(v2) == 5
assert v2.dot_product(v1) == 5

An example computation for the first case: $$ [1,0,0,2,3] \cdot [0,3,0,4,0] = 1\cdot0 + 0\cdot3 + 0\cdot0 + 2\cdot4 + 3\cdot0 = 2\cdot4 = 8 $$


Dictionary Solution

Since the vectors will be sparse, we will store only the populated indexes in a dictionary. To optimize traversal, we'll iterate over the smaller of the two dictionaries during multiplication.

class SparseVector:
    def __init__(self, nums: list[int]):
        self.data = {i: v for i, v in enumerate(nums) if v}

    def dot_product(self, other_vector: 'SparseVector') -> int:
        res = 0
        d, other_d = self.data, other_vector.data
        if len(other_d) < len(d):
            d, other_d = other_d, d
        for i in d:
            if i in other_d:
                res += d[i] * other_d[i]
        return res

Dictionary Solution Complexity

Let $n = $ the number of non-zeros in the first vector, $m = $ the number of non-zeros in the second vector.

Runtime complexity for the dot_produt method should be $\Theta(min(n, m))$.

Auxiliary space complexity for the two vectors should be $\Theta(n, m)$. In general, for a system of vectors ${v_i}$ with number of non-zeros ${n_i}$, the auxiliary space complexity should be $\Theta(\sum(n_i))$.


Binary Search Solution

While the dictionary solution works very well for very sparse vectors, it works less well for less sparse vectors due to memory usage.

For such cases, we can store our sparse vectors as tuples of (index, value). For the dot product, we'll iterate over the smaller of the two and do a binary search for the current index in the larger array.

class SparseVector2:
    def __init__(self, nums: list[int]):
        self.data = [(i, v) for i, v in enumerate(nums)]

    def dot_product(self, other: 'SparseVector') -> int:
        res = 0
        d, other_d = self.data, other.data
        if len(d) > len(other_d):
            d, other_d = other_d, d
        other_idx = 0
        other_max = len(other_d) - 1
        def search_other(left: int, target: int) -> int:
            l, r = left, other_max
            while l < r:
                mid = (l + r) // 2
                if other_d[mid][0] == target:
                    return mid
                if other_d[mid][0] < target:
                    l = mid + 1
                else:
                    r = mid
            return l
        for idx, val in d:
            other_idx = search_other(other_idx, idx)
            if other_d[other_idx][0] == idx:
                res += other_d[other_idx][1] * val
        return res

Binary Search Solution Complexity

Let $n = $ the number of non-zeros in the more sparse vectors, $m = $ the number of non-zeros in the less sparse vector.

Runtime complexity of dot_product should be $\Theta(n \cdot log(m))$.

Auxiliary space complexity should be the same as the dictionary solution above.