See all coding puzzles and patterns here.

Problem Definition

Given two strings, find the length of their longest common subsequence (LCS). The subsequence does not need to be contiguous in either string.


Examples

Some test cases to verify our solution.

assert lcs("abcdefghi", "zzfghizz") == 4 # "fghi"
assert lcs("abcdefghi", "azbzczdzezfz") == 4 # "abcd"
assert lcs("abc", "xyz") == 0

Solution

This is a dynamic programming problem.

We construct a cache backwards. For every index pair $i$, $j$ for s1 and s2, respectively, the LCS of the substrings starting at those indexes equals:

  1. 1 + lcs(s1[i+1:], s2[j+2]) if s1[i] == s2[j]
  2. max(lcs(s1[i+1:], s2), lcs(s1, s2[j+1:])) otherwise.
So we have to construct the cache backwards, starting from the base case where i and/or j are out of range, in which case, the LCS should be 0. Example calculations below
 |a|b              |c              |
x| |               |               |0
b| |1 + down_right |max(left, down)|0
z| |max(left, down)|max(left, down)|0
 |0|0              |0              |0

Partially filled example matrix below, where we fill the matrix from the bottom right and work toward the top left:

 |a|b|c|d|e|f|g|h|i|
z| | | | | | | | | |0
z| | | | | | | | | |0
f| | | | | | | | | |0
g| | | | | | |3|2|1|0
h|2|2|2|2|2|2|2|2|1|0
i|1|1|1|1|1|1|1|1|1|0
z|0|0|0|0|0|0|0|0|0|0
z|0|0|0|0|0|0|0|0|0|0
 |0|0|0|0|0|0|0|0|0|0

We construct the matrix and then work from the bottom up and from the right to the left, in that order.

def lcs_internal(s1, s2):
    cache = [["" for _ in range(len(s2) + 1)] for _ in range(len(s1) + 1)]
    for i in range(len(s1)-1, -1, -1):
        for j in range(len(s2)-1, -1, -1):
            if s1[i] == s2[j]:
                cache[i][j] = s1[i] + cache[i+1][j+1]
            elif len(cache[i+1][j]) > len(cache[i][j+1]):
                cache[i][j] = cache[i+1][j]
            else:
                cache[i][j] = cache[i][j+1]
    return cache[0][0]


def lcs(s1, s2):
    return len(lcs_internal(s1, s2))

Complexity

Let $n = $ len(s1), $m = $ len(s2).

Runtime complexity should be $O(n \cdot m)$.

Auxiliary space complexity should be $O(n \cdot m)$.