See all coding puzzles and patterns here.

Problem Definition

Given a string $s$ and a string $t$, find the number of distinct subsequences of $s$ that equal $t$. The subsequences need not be contiguous.


Examples

Some test cases to verify our solution.

assert dist_subseq("abcbc", "abc") == 3 # "abc__", "ab__c", "a__bc"
assert dist_subseq("answwer", "answer") == 2

Solution

This is a dynamic programming problem.

For every index sdx, tdx in $s$ & $t$ respectively, if s[sdx] == t[tdx], then the number of subsequences dp[sdx][tdx] = dp[sdx+1][tdx+1] + dp[sdx+1][tdx]; otherwise, dp[sdx][tdx] = dp[sdx+1][tdx].

We will initialize dp with all entries equal to 0 except where the last character of $t$ matches any character in the substring of $s$. In those cases, it'll be the number of characters that match.

E.g., for dist_subseq("abcbc", "abc"):


    a  b  c  b  c
a [[0, 0, 0, 0, 0]]
b [[0, 0, 0, 0, 0]]
c [[0, 0, 2, 1, 1]]

From there, we will iterate over the remainder of the matrix while skipping the last column and row, since they have already been processed. (The last column should be empty except if the last character of both $s$ and $t$ matches, since we need to satisfy all characters in $t$ and by definition, only the substring t[:-1] being satisfied means $t$ is not.)

def dist_subseq(s, t):
    s_len = len(s)
    t_len = len(t)
    dp = [[0 for _ in range(s_len)] for _ in range(t_len)]
    if s[-1] == t[-1]:
        dp[-1][-1] = 1
    for sdx in range(s_len - 2, -1, -1):
        if s[sdx] == t[-1]:
            dp[-1][sdx] = 1 + dp[-1][sdx+1]
        else:
            dp[-1][sdx] = dp[-1][sdx+1]
    for tdx in range(t_len - 2, -1, -1):
        for sdx in range(s_len - 2, -1, -1):
            if s[sdx] == t[tdx]:
                dp[tdx][sdx] = dp[tdx+1][sdx+1] + dp[tdx][sdx+1]
            else:
                dp[tdx][sdx] = dp[tdx][sdx+1]
    return dp[0][0]

Complexity

Let $n = $ len(s), $m = $ len(t).

Runtime complexity and auxiliary space complexity should be $O(n \cdot m)$.